Cornell experts in computational biology and bioinformatics have made key contributions to the analysis of the genome of the rhesus macaque, better known as the rhesus monkey. The Cornell researchers were part of a consortium of some 200 scientists around the world whose work is reported in a special section of the April 13 issue of the journal Science.
The rhesus macaque (Macaca mulatta) is physiologically similar to humans and therefore widely used in medical research, particularly in vaccine testing and as a model for AIDS research. Understanding its genome and how it differs from that of human beings promises to offer new insights into the evolution of humans and other primates and has important implications for medical research. (See two stories below.)
After the macaque genome was sequenced in 2005, additional scientists, including a Cornell team, were recruited to analyze the results. Richard Gibbs of Baylor College of Medicine oversaw the entire project. The work at Cornell was performed mainly by research groups under Adam Siepel and Carlos Bustamante, assistant professors of biological statistics and computational biology, with assistance from Andrew Clark, professor of molecular biology and genetics. To analyze the genome, which consists of 2.9 billion DNA base pairs, the researchers used a dedicated computational biology cluster at the Cornell Theory Center, a supercomputer with 1,234 parallel processors.
Siepel's group studied genes that were found to be common to humans, macaques and chimpanzees. (The chimpanzee genome was sequenced in September 2005.) They identified 10,376 genes whose function is at least partially known, and looked for differences that would show how evolution had progressed.
"Before this paper, analyses of this kind had focused on human and chimp, and they're so close that it's not as interesting," Siepel says. "The macaque gives us the ability to more sensitively detect subtle natural selection pressure."
By comparing genes that have had 25 million years to change (as compared with the 6 million year gap between humans and chimpanzees), the researchers can learn something about how and why those changes took place.
Over time, minor changes in genes occur randomly, often without changing the amino acids -- protein building blocks -- for which the genes encode. Siepel's group used these changes as an indicator of how much random change should be expected over 25 million years. Then they looked at changes that would code for a different amino acid, which might cause a change in function, and compared these with the expected random rate of change.
"Where the amino acids have changed more than you'd expect it's possible nature has responded to some environmental effect," Siepel explains. For example, the researchers found the most evidence for positive selection in a gene coding for keratin, a protein involved in the formation of hair shafts. Perhaps humans are less hairy than monkeys because of an ancient climate change or some shift in the standards of mate selection, the researchers speculate. Other genes that seem to have been selected for over the years include several involved in the immune system and cell-membrane signaling systems.
On average, the researchers say, genes in the human and chimpanzee genomes have evolved more rapidly than in the other primates, after adjusting for random rates of change. And comparisons with the genomes of rodents and dogs show that primate genes have evolved more rapidly than those in those animals, which split off the evolutionary tree even earlier.
No one found any "big brain genes," Bustamante jokes. Many physiological differences may be controlled by regulatory sequences that turn other genes on and off, he explains, and the study did not include those sequences.
Siepel's group also analyzed genes that are duplicated in several different locations on the genome. They zeroed in on a family of genes known as PRAME (preferentially expressed antigen of melanoma) that are active in cancer cells and seem to be involved in the formation of sperm. Humans have at least 26 copies. Comparison with the mouse genome suggests that there was a spurt of duplication of this gene early in primate evolution, and comparison with the macaque shows another spurt of copying in both humans and chimpanzees, with the greatest duplication in humans and with evidence for positive selection. This suggests, the researchers say, that the PRAME family has played an important role in human evolution.
Bustamante's group studied variations within the macaque genome -- the ways in which individuals within the species differ from one another. While the complete genome sequencing of the macaque was done with the DNA of a single individual, for studies of variation researchers at Baylor also sequenced part of the genomes of 16 other macaques, eight from China and eight from India, and targeted five regions of the genome for deeper analysis, sequencing those regions in fine detail in 47 individuals.
Macaques show less variation on the X chromosome (one of the two sex chromosomes) than on others, Bustamente's group found.
"Evolutionary theory predicts that if natural selection is important in shaping the sex chromosome, there will be less variation on the X," Bustamante says. Since males have only one X chromosome, he explains, a change can't hide on the recessive side for a few generations and escape selection pressure. A surprise finding was that variation in the X chromosome was only 50 percent of what was seen on the other chromosomes, whereas about 75 percent had been expected.
The researchers also saw substantial differences between the Indian and Chinese macaque populations, which they said could be due to sweeps of natural selection or major differences in the histories of the two populations.
Ryan Hernandez, a graduate student in Bustamante's group, led an analysis of the difference between Chinese and Indian macaques as well as variations within each of those populations. That work is reported in a separate paper in the April 13 Science.
The analysis suggests that the two populations separated about 162,000 years ago. Both Indian and Chinese macaques are used in biomedical research, and understanding the genetic differences between the two populations is important, Hernandez says. For example, he points out, the simian immunodeficiency virus (SIV) is used as a model for the human immunodeficiency virus (HIV), but when exposed, Chinese macaques develop AIDS-like symptoms more slowly than Indian macaques.
An important finding for medical research, Hernandez says, is that you can travel much farther along the DNA strand in the Indian macaque than in the Chinese macaque before finding a difference between individuals. Researchers looking for a disease-causing gene don't usually find the exact DNA sequence of the gene right away. Instead, they first determine that the gene is somewhere between two easily recognized sequences called markers and zero in from there. In Indian macaques, Hernandez says, those markers can be farther apart, making the search easier. From there, he suggests, the search could be continued with Chinese macaques, using markers closer together. It is often easier to track a gene in a controlled population of laboratory monkeys than in humans, but since the two genomes are so similar, once it is found in the macaque it can usually be located in humans.
The work on variations, Bustamante said, will help in the development of a dense genetic map for macaques that will ultimately improve scientists' ability to identify human genes involved in such diseases as cancer, diabetes and heart disease.
Along with Bustamante, Clark, Siepel and Hernandez, Cornell co-authors of the Science papers are graduate student Jeremiah Degenhardt, postdoctoral researchers Tomas Vinar and Carolin Kosiol, undergraduate researchers Alexandra Denby and Alison Marklein, and staff programmer Amit Indap.