Cornell physicist awarded NSF grant to find faster way to determine shape of protein molecule

When a beam of X-rays is fired through a crystallized protein sample, the beam is scattered into a pattern that depends on the arrangement of atoms in the crystal. By decoding that pattern, experts can find the arrangement of the atoms and the shape of the protein molecule. But the decoding process is as much art as science, generally taking weeks or even months to find the structure of a large protein.

Now a Cornell University researcher hopes to develop a much faster way to analyze the data on a computer. Veit Elser, associate professor of physics, has received a three-year National Science Foundation grant of $234,320 to develop new computer algorithms for X-ray crystallography. The grant comes from the agency's Information Technology Research initiative.

"The approach that looks promising is one that can take advantage of the kind of algorithms that are already effectively in use for solving other hard problems," Elser says.

Crystallographers measure the amplitude of the X-rays emerging from a sample by scanning across a plane on the output side. This gives a pattern of interference fringes similar to the pattern of bright and dark lines that result when light is shined through two narrow slits, or the pattern of colored fringes that appears in light reflected from the surface of a compact disc. The pattern, much more complex in this case, is created by X-rays scattered by interaction with electrons in atoms in the crystal and is shaped by the way those electrons are arranged.

But amplitude, Elser explains, is only half of the information needed to deduce the pattern of the crystal. It would help tremendously, he says, to have information about the phases of the waves at each point. This refers to whether a wave is at a peak or a trough, or somewhere between. But so far, X-ray crystallographers don't have good ways to measure phase. You could, Elser says, use a computer to try combining the measured amplitudes with random patterns of phase. By a quirk of the math, out of the astronomically large number of phase patterns possible, there's only one that will combine correctly with one particular pattern of amplitudes. "If I use a computer to try random phases and see what I get, I'll find that almost all of those I can reject because they produce absolute nonsense, but if I'm lucky and hit on this magical combination, out pops a map of the concentrated lumps of charge," Elser says, explaining that each lump of charge represents an atom.

But the number of possibilities, he adds quickly, is something like 1 followed by 30,000 zeros. If the computer happens not to get around to the right one until anywhere near the end of the list, the search could take longer than the age of the universe.

In recent years computer scientists have had success in solving these problems by establishing limits, or "bounds," on the possible solutions. The computer goes a short distance down one branch of the tree of possible solutions and tests the results. If they don't look promising, it drops that branch and quickly goes on to another.

The trick is to choose the right test. In X-ray crystallography, Elser says, "You need to come up with a quantitative evaluation of whether [the tentative solution the computer gets] could be a reasonable structure. The test turns out to be an amazingly simple one: the total charge of the electrons, because X-rays only scatter off electrons."

In other words, the computer tries a random pattern of phases, looks at the total charge of the electrons in the structure that results and decides whether or not it's going in the right direction. If not, it discards the group of related patterns it was trying and starts at a different place. "Instead of searching all the possibilities, it can find the best combination in the search tree by only examining a very minute fraction of the whole tree," Elser says.

If this works, Elser hopes to achieve what he calls, "The holy grail, when just feeding the data into the computer will give you the protein structure in a matter of minutes."

Elser also hopes to study a question that is of interest on the more academic side: how the computational difficulty of the problem grows with the size of the protein.

Media Contact

Media Relations Office