Protein data bank opens new era with broader support


Nearly 24,000 molecules and growing, accessible collection

ARLINGTON, Va.- The assets of the Protein Data Bank (PDB) just keep growing.

The PDB holds the three-dimensional structures of nearly 24,000 proteins and other macromolecules in its growing - and publicly accessible - collection. Its holdings profile DNAs, RNAs, viruses, and various proteins, such as enzymes central to photosynthesis, growth, development and brain function.

This month, with a doubling in the number of the federal agencies supporting it, the PDB begins a new five-year, $30 million management era, the National Science Foundation announced today. The chapter opens following a new international agreement announced last month to pool and coordinate the deposit of molecular structure data globally.

Mary Clutter, assistant director for NSF's Directorate for Biological Sciences, said, "The Protein Data Bank is a treasure chest of shared discoveries." This new agreement will ensure that it continues to serve biologists around the world as its collection grows and diversifies.

"Biological processes involve small molecular machines," she said. "Understanding how these machines function often begins with knowing how their parts are structured, how they fit together." Thus, to have these molecular structures archived comprehensively, centrally and consistently is of enormous value across the spectrum of biological research, from genomics to systems biology.

"And because of the data bank's openness and accessibility, individual researchers - and humanity as a whole - will continue to benefit from the collective research of thousands of biologists," Clutter said.

For example, the collection includes the intricate membrane channel proteins recognized in the 2003 Nobel Prize in Chemistry.

The structure of another PDB deposit, the enzyme carbonic anhydrase, also permeates biology. Showcased as the PDB's January 2004 "Molecule of the Month," it is crucial for photosynthesis in plants and bacteria, the building of coral reefs and many fundamental processes in animals - such as bone formation, breathing and muscle contraction.

NSF has supported the Protein Data Bank continuously since 1975. A multi-agency support partnership first formed in 1989. For the past five years, that partnership has included NSF, the National Institute of General Medical Sciences (NIGMS), the Department of Energy (DOE) and the National Library of Medicine (NLM). The partnership has been expanded now to include the National Cancer Institute (NCI), the National Center for Research Resources (NCRR), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the National Institute of Neurological Disorders and Stroke (NINDS).

The agreement, which began Jan. 1, calls for the PDB to continue to be managed by the three members of the Research Collaboratory for Structural Bioinformatics (RCSB): Rutgers, The State University of New Jersey; the San Diego Supercomputer Center at the University of California, San Diego; and the University of Maryland/National Institute of Standards and Technology's Center for Advanced Research in Biotechnology.

Last month, the RCSB announced an international partnership to establish a worldwide PDB, coordinating with similar efforts at the Institute for Protein Research at Osaka University in Japan and at the European Bioinformatics Institute (EBI) in the United Kingdom.

The expansion of federal agency partnerships and international participation mirrors the expansion in opportunities for progress in a new era of structureinformed research.

According to James Cassatt of NIGMS, "The use of structures has revolutionized the development of new drugs, including that of all of the HIV protease inhibitors. The use of these drugs as part of combination therapy is prolonging the lives of people infected with HIV."

The PDB collection includes a wide variety of medically important structures, including enzymes and other proteins associated with influenza, HIV, SARS and other viruses; parts of prion proteins (including the bovine form implicated in Mad Cow Disease or BSE); the amyloid peptide associated with Alzheimer's disease; and the p53 tumorsuppressor protein associated with a wide variety of human cancers.

The PDB also serves the Department of Energy's Genomics:GTL program, which explores the biology of microbes to seek new ways to remediate environmental contamination, sequester carbon dioxide and generate energy from biomass. According to Aristides Patrinos, director of the Office of Biological and Environmental Research in DOE's Office of Science, knowing the structures of key molecules will help scientists understand "the protein machines that carry out the many functions of microbial cells in communities."

As the sole international repository for comprehensive structural data of large biological molecules, the PDB serves researchers and educators in academic, industrial and biotechnical pursuits.

When the data bank was first established in 1971, it contained seven structures. After 25 years, that number grew to slightly more than 5,000 structures. Three years later, there were more than 10,000. Deposits keep coming, and their data keeps generating interest worldwide: During 2003, more than 4,600 new molecular structures were added, and, on an average day, bank visitors downloaded various structural files more than 120,000 times.

According to PDB Director Helen Berman, "When the PDB started, it was felt that the data contained in protein structures would provide the information needed to understand the molecular underpinnings for a host of biological processes. This vision is being realized, and it is now even more important that the data be preserved and publicly available from a single source."

The structural data comes from experiments using x-ray crystallography, nuclear magnetic resonance, electron microscopy and other methods. After a scientist submits a structure, the experimental data - the deposit - is validated and annotated. Coordinating with the biological journals that publish the discovery of new protein structures, the PDB also ensures that the data is available in the public domain.

As the PDB grows and evolves, one of its central challenges will be the expanded integration of its wealth of information with other biological data, images and research articles.

According to Kim Henrick of the European Bioinformatics Institute, "The PDB must expand both in the storage and annotation of protein production information and into other 3-D structure fields with linkages made to electron microscopy (EM) data. EM experimental data will make an enormous impact in the next five years in molecular biology."

Over the next five years, the PDB's challenges will also include keeping up with the increasing complexity and volume of deposited structures, meeting the demands for more complex queries, and providing more detailed annotation of the experiments and the structures.

Along with serving scientists, the PDB also serves as an educational resource for students and educators at all levels, thus another challenge is to meet the needs of an expanding, diverse and global user community.

Source: Eurekalert & others

Last reviewed: By John M. Grohol, Psy.D. on 21 Feb 2009
    Published on All rights reserved.