July 28, 1999

Cornell researchers use physical laws to simulate protein folding

Researchers at Cornell have had their best success yet in simulating the folding of a protein solely from the physical laws that govern the behavior of its atoms.

A group led by Harold Scheraga, the Todd Professor of Chemistry emeritus, simulated the folding of the protein HDEA from the bacterium E. coli on Cornell's IBM supercomputer and predicted a structure consisting of a bundle of five spiral coils that matched 80 percent of the structure found by X-ray crystallography. It was the best match of several computer-generated structures for the protein submitted to the Third Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP-3), which took place over the second half of 1998.

CASP-3 is a cooperative experiment to test the accuracy of computer simulations of protein folding. Researchers are given the amino-acid sequence of a number of proteins for which the shape has already been determined by X-ray crystallography or nuclear magnetic resonance techniques and asked to submit their computer solution for the structure.

Scheraga's group submitted seven structures out of the CASP list of 43 and had solid successes with two of those. In addition to HDEA, they scored high with another protein, called MarA. The Scheraga group produced the best match to the actual structure of any simulation based solely on physical laws, although other groups found more accurate matches by using programs that compared the simulated structure with the structures of similar, already-known proteins.

Cells make proteins by stringing together long chains of organic molecules called amino acids. The chain quickly folds into a compact shape, something like the way a piece of string will bunch up if you twist the ends. The folding is driven by the attractions and repulsions between the positive and negative electric charges of the atoms making up the molecules. The final shape will be the one in which the positive and negative charges are as close to one another, on the average, as they can get; in technical terms, the protein molecule has the lowest possible potential energy.

That final shape determines the biological activity of the protein. Enzymes, for example, do their work because parts of them match the shapes of the molecules whose reactions they control. Being able to predict the shape from the sequence of amino acids would help biologists identify the functions of proteins made by certain genes, such as those associated with Alzheimer's disease, cystic fibrosis and other hereditary disorders. Accurate predictions of protein shape could also lead to the ability to design new drugs from scratch.

Theoretically, a computer could calculate all the possible shapes for a given chain of amino acids and choose the one with the lowest potential energy. In practice, however, there are so many possibilities that it would take longer than the age of the universe to do all the calculations. So, many researchers take shortcuts, by looking for similarities between the chain to be tested and other known proteins or by simulating a partial fold and then comparing the results to known shapes.

Unfortunately, there are not easy matches for all proteins, so it would be desirable to have what's called an ab initio (from the very beginning) approach that simulates the entire folding process. That's the approach used by the Scheraga group, which bases its simulations on the laws of physics, calculating how the forces between atoms affect their arrangement.

In this case, they shortened the computation by starting out with a simplified version of the amino acid chain. Every amino acid has the same "backbone," a string of one nitrogen atom and two carbon atoms. Different side chains attached to the carbon in the middle identify the 20 different amino acids that make up proteins.

The computer program developed by Scheraga's group at first ignores the nitrogen and carbon atoms at the ends and works with a simplified version of the central carbon and its side chains to generate several rough structures, which it uses as starting points for a full simulation that considers all the forces between all the atoms.

Even with this procedure, the calculation of the structure of the HDEA protein took 70 hours running on 64 parallel processors of the Cornell Theory Center's IBM SP2 supercomputer. The MarA simulation took 100 hours on the same number of processors.

The Scheraga group includes visiting scientist Adam Liwo, research associates Jooyoung Lee and Jaroslaw Pillardy, and senior research associate Daniel R. Ripoll. They use the Parallel Processing Resource for Biomedical Applications at the Cornell Theory Center, funded by the National Center for Computational Resources at the National Institutes of Health.

The group's approach is described in four papers:

-- J. Lee, A. Liwo and H. A. Scheraga, Energy-based de novo protein folding by conformational space annealing and an off-lattice united-residue force field: Application to the 10-55 fragment of staphylococcal protein A and to apo calbindin D9K, Proc. Natl. Acad. Sci., USA, 96, 2025-2030 (1999).

-- A. Liwo, J. Lee, D. R. Ripoll, J. Pillardy and H. A. Scheraga, Protein structure prediction by global optimization of a potential energy function, Proc. Natl. Acad. Sci., USA, 96, 5482-5485 (1999).

-- J. Lee, A. Liwo, D.R. Ripoll, J. Pillardy and H.A. Scheraga, Calculation of protein conformation by global optimization of a potential energy function, Proteins: Structure, Function and Genetics, in press.

-- J. Lee, A. Liwo, D. Ripoll, J. Pillardy, J.A. Saunders, K.D. Gibson and H.A. Scheraga, Hierarchical energy-based approach to protein-structure prediction; blind-test evaluation with CASP3 targets, Intl. J. Quantum Chem., submitted.

Detailed results of CASP-3 are on display (in highly technical language) on the CASP web site at http://predictioncenter.llnl.gov/casp3/results/access.cgi under the category for 3D coordinate predictions. The complete results of CASP-3 will be published in a special edition of the journal Proteins due out in late summer.

The Cornell Theory Center is a high-performance computing and interdisciplinary research center that has made parallel computing a usable tool for computational science and engineering since its founding in 1984. CTC receives funding from Cornell, New York state, the National Center for Research Resources at the National Institutes of Health, the National Science Foundation, the Department of Defense Modernization Program and members of the Corporate Partnership Program, through which high-tech businesses support Theory Center research.