Photo credit: www.sciencedaily.com
The Gene Ontology Consortium has unveiled a groundbreaking new resource, offering a comprehensive encyclopedia detailing the functions of all known protein-coding human genes. This significant achievement results from collaboration among researchers from the Keck School of Medicine at USC, the Swiss Institute of Bioinformatics, and various other institutions. For the first time, the integration of large-scale evolutionary modeling with genetic data from both humans and other organisms has led to a freely accessible platform, allowing users to explore the functions of more than 20,000 genes based on the most reliable and extensive evidence available. Details of this new resource were published in the journal Nature.
The Gene Ontology, funded by the National Institutes of Health, has evolved over more than 25 years into an essential tool for biomedical research. Currently, it supports over 30,000 publications each year, aiding scientists with their data analysis and interpretation tasks.
Researchers engaged in “omics” studies, which encompass extensive investigations into DNA, RNA, proteins, and other biological elements, generate immense amounts of data that can highlight numerous genes related to particular conditions, such as cancer. However, sifting through extensive published literature to assess the functions of these genes is often impractical. Instead, many researchers rely on the Gene Ontology as a valuable resource.
Paul D. Thomas, PhD, a principal investigator with the Gene Ontology Consortium and a professor at the Keck School of Medicine, stated, “Our knowledge base enables scientists to transition from merely identifying a list of genes to comprehending their biological roles, which can guide therapeutic strategies.”
The latest addition to this knowledge base enhances its capabilities through the use of evolutionary modeling, which merges experimental insights from human genes with information gathered from related genes in model organisms like mice and zebrafish. This integration provides researchers with a deeper understanding of human gene functions, effectively addressing gaps where direct human data may be lacking.
Thomas emphasized, “Having established a substantial knowledge base that serves as an authoritative reference for human gene functions, we have now enriched this information by incorporating evolutionary timelines, resulting in a more comprehensive and precise understanding of gene encoding functions.”
An evolutionary view
This innovative resource was developed by a dedicated team of over 150 biologists worldwide, including contributors from the Keck School of Medicine of USC. Since its inception in 1998, the consortium has meticulously scrutinized over 175,000 scientific papers regarding gene functions, collecting data on every gene within the human genome, specifically targeting the 20,000-plus protein-coding genes essential for crucial biological processes.
Through extensive literature review, researchers systematically categorized each gene according to its functions, either independently or in collaboration with other genes. They constructed a detailed catalog of over 40,000 functions encompassing diverse biological processes, including cell division, signaling, immune responses, and molecular transport. This knowledge helps scientists understand the underlying mechanisms of diseases like cancer and facilitates the development of targeted treatment approaches.
The newly launched descriptive resource, known as the “PAN-GO functionome,” is anticipated to enhance traditional applications within the scientific community, particularly in analyzing omics data. Thomas noted that the tool’s development, using large-scale evolutionary models, now offers researchers a far more precise overview of gene functions.
In instances where experimental data pertaining to human genes is unavailable, insights can be gleaned from related genes in organisms such as mice, rats, zebrafish, and others. By understanding the evolutionary context of specific functions, researchers can extrapolate data from these organisms to better comprehend human gene functionality.
Thomas pointed out, “This methodology allows for the inference of human gene characteristics, even in the absence of direct experimental evidence.”
Further improving the knowledge base
While the PAN-GO functionome stands as the most exhaustive resource regarding gene functions, it is important to note that it currently encompasses data for 82% of protein-coding genes, leaving around 3,600 genes—approximately 18%—without experimental data and uncertain biological functions.
Thomas remarked, “We now have a clearer picture of the areas where information is lacking, and this will guide future research efforts in the field.”
Alongside Thomas, the study’s authors include Huaiyu Mi, Anushya Muruganujan, Dustin Ebert, and Tremayne Mushayahama from the Keck School of Medicine; Marc Feuermann and Pascale Gaudet from the Swiss Institute of Bioinformatics; and Suzanna E. Lewis from Lawrence Berkeley National Laboratory, among many other contributors from approximately 50 institutions globally.
This research was primarily financed by the National Institutes of Health.
Source
www.sciencedaily.com