New DNA from understudied groups reveals modern genetic variation, ancient population shifts
A study of hundreds of new genomes from across the globe has yielded insights into modern genetic diversity and ancient population dynamics, including compelling evidence that essentially all non-Africans today descend from a single migration out of Africa.
|A heat map showing locations of previously unknown DNA variants. Red indicates higher number of discoveries, |
black fewer [Credit: Harvard Medical School]
The study represents the largest data set yet of high-quality genome sequences from understudied populations, adding nearly 6 million DNA base pairs to the "canonical" human genome sequence published in 2001.
The data identify millions of previously unknown population-specific mutations that may help scientists develop precision-targeted diagnostic tests and treatments on their quest to improve the health of the world's underserved populations.
Most genome-wide population sequencing studies to date have focused on a handful of large populations. The HMS-led study, by comparison, sequenced samples from 142 smaller populations, most of which were previously understudied.
"As humans, we are not just the people who live in industrialized countries, and we are not just the people who live in numerically large groups," said David Reich, professor of genetics at HMS and senior author of the study. "If we want to understand who we really are, we have to realize that some of the most interesting aspects of human variation are only present in underrepresented, small populations."
"We wanted to go out into the world and pull together as many of the ethnically, linguistically and anthropologically diverse samples as we possibly could," said Swapan Mallick, bioinformatic systems director in the Reich lab and first author of the study.
The team's analyses are already answering questions about various populations' genetic origins, but, the researchers note, these insights are only a milestone on a longer journey.
"Of course, there are thousands of ethnically distinct populations in the world, and much more work needs to be done," said Mallick.
Reich, Mallick and their international team of colleagues began by selecting two genomes each from 51 populations represented in a collection called the Human Genome Diversity Project. Next, they assembled samples from members of 91 other groups, including diverse Native American, South Asian, and African populations not previously included in genome-wide studies, and sent the DNA for sequencing. In all, the project analyzed the genomes of 300 people.
Together, the three studies put to rest a lingering question about whether indigenous peoples of Australia, New Guinea and the Andaman Islands descend in large part from a second group that left Africa earlier and skirted the coast of the Indian Ocean. They do not, the HMS researchers say.
"Our best estimate for the proportion of ancestry from an early-exit population is zero," said Reich, who is also an investigator of the Howard Hughes Medical Institute and associate member of the Broad Institute. "Taken together, all three studies leave wiggle room for, at most, around two percent."
The HMS-led study further revealed that the common ancestors of modern humans began to differentiate at least 200,000 years ago, long before the out-of-Africa dispersal occurred.
"It had been unclear whether the group that expanded out of Africa represented a large subset of the populations within Africa," said Mallick. "This really shows that there was a lot of substructure prior to the expansion."
"There does not seem to have been one or a few enabling mutations that suddenly appeared among our ancestors and allowed them to think in profoundly different ways," said Reich.
Instead, the researchers say, a constellation of factors, including environment, lifestyle, and possibly genes, precipitated the rapid changes that occurred.
"Geneticists often search for examples where genetics is the explanation. Here, paradoxically, genetic data are showing that there will be no clear genetic answers," Reich said.
Mallick and colleagues overcame significant logistical hurdles posed by sharing and processing an enormous amount of data.
Often, in studies of this size, data are collected in many laboratories that use different sequencing machines and different experimental protocols. This can create so-called batch effects that make it difficult to distinguish true differences among samples. The current study minimized batch effects by sending all of the samples to a single center to be sequenced at the same time.
The team made much of the data set publicly available in 2014; multiple research groups have already used it for their studies.
In a way, the authors say, the findings reported thus far are just the tip of the iceberg.
"It's impossible for our group to analyze even a tiny fraction of what the data represents," said Mallick. "Our goal is to push the data out and let people use it to consider their own questions."
Author: Stephanie Dutchen | Source: Harvard Medical School [September 21, 2016]