Golan Yona, assistant professor of computer science at Cornell University, has received a National Science Foundation (NSF) Faculty Early Career Development Program award to support his research into creating a map of all known proteins. Such a map might be akin to the periodic table of the elements. Yona, a computational biologist, will receive a five-year grant of $1,103,915 to support his research.
The NSF-funded research grant, "Global Self-Organization of All Known Proteins – Toward a Complete Map of the Protein Space," has the goal of organizing and categorizing the hundreds of thousands of known proteins into a multi-dimensional map. By locating a new protein on the map, researchers might be able to guess its function by looking at others that fall nearby. Also, when a gene is sequenced, by comparing that sequence with others on the map, researchers might be able to predict the structure and function of the protein for which the gene codes.
Dimensions of the map indirectly reflect such characteristics as the physical shape, the topology or the amino acid sequence of proteins. Yona likens the idea to the periodic table of the elements, first proposed by Mendeleev in 1869, which arranges the elements in order of increasing proton numbers and places those with chemical similarities in the same columns. The table helped to predict elements that had not yet been discovered.
"You assume there is some underlying principle to this process," Yona explains. "Proteins evolved and generated a collection of different families, but it is not a random collection of families." He says he hopes to find global principles that may explain the organization and creation of the protein space.
A few proteins could be organized and classified by hand, just as biologists have organized and classified animals and plants. But there are so many known proteins that only computers can process all the possible relationships, he adds. The process takes months of calculations over large computer clusters (collections of microprocessors running in parallel to form a supercomputer).Since new proteins are added to the databases everyday, the map needs to be updated frequently, Yona points out. In addition, he says, with the development of new technologies, new types of protein data become available, all of which should be integrated into the map (such as gene expression data; information on protein-protein interactions; and the "context," or biological pathways, in which the protein participates). This poses serious theoretical as well as computational problems, he says.
Yona already has created a preliminary map and hopes to have a refined version available within a year. He has created a web site, ProtoMap (see details in box below), where researchers can submit protein descriptions or gene sequences and find their locations on the map. A second database (BioSpace), which he developed while at Stanford University, stores three-dimensional models for more than 160,000 proteins.
Yona received a B.S. in physics in 1992 and a Ph.D. in computer science in 1998 from Hebrew University in Jerusalem. He then worked in the Department of Structural Biology at Stanford, where he was a Burroughs Wellcome postdoctoral fellow. At Cornell, where he joined the faculty as assistant professor in January 2001, he teaches courses in computational molecular biology and machine learning.
The award, the NSF's most prestigious for new faculty members, recognizes and supports the early-career development activities of those teacher-scholars who are considered most likely to become the academic leaders of the 21st century.