Annotation marathon validates 21,037 human genes


International consortium provides the first step towards a comprehensive functional link between the genome sequence scaffold and human diseases

The announcement of the human genome sequence three years ago was widely hailed as one of the great scientific achievements in modern history. But sequencing the genome is just a first step -- the monumental task of ascribing biological meaning to those sequences has just begun. The H-Invitational international consortium, led by Takashi Gojobori (Tokyo, Japan) has taken a significant step towards this goal, by validating and annotating over 20,000 human genes, using publicly available resources.

By relating intermediate gene products called messenger RNAs to each of their parent genes, and exhaustively connecting them to the relevant proteins, the consortium has established a reliable systematic network of human-curated relationships between genes and their biological functions.

The study, reported in the open access journal PLoS Biology, has taken over two years to complete, and is expected to set the standard for analysis of gene expression and human diseases worldwide through the publicly available H-Invitational database. There are estimated to be about 30,000 genes in humans, so having a detailed functional map of a majority of them will be a boon for geneticists, drug researchers and genome scientists around the world. There is a wealth of information, including evidence for several thousand new genes, data about variable expression and genetic variation within the genes.

The consortium has laid the groundwork to address the challenge of connecting the functions of genes and their products to the clinical effects that each of them has upon human health. "We are confident now that anyone in academia or industry who uses our database will gain far deeper insight into the meaning of human disease than was previously possible" stated Professor Gojobori. The work also builds on the traditions of international cooperation and large-scale collaboration, which played such an important part in the deciphering of the sequence itself. The consortium is made up of 152 scientists from developed as well as developing nations, including Australia, Brazil, China, France, Germany, Korea, South Africa, Sweden, Switzerland, the United Kingdom and the United States.

