Home ›› 16 Jun 2022 ›› Opinion

Largest human family tree

Nicoletta Lanese
16 Jun 2022 00:00:00 | Update: 16 Jun 2022 00:38:20
Largest human family tree

A new, enormous family tree for all of humanity attempts to summarize how all humans alive today relate both to one another and to our ancient ancestors.  

To build this family tree, or genealogy, researchers sifted through thousands of genome sequences collected from both modern and ancient humans, as well as ancient human relatives, according to a new study published Thursday (Feb. 24) in the journal Science(opens in new tab). These genomes came from 215 populations scattered across the world. Using a computer algorithm, the team revealed distinct patterns of genetic variation within these sequences, highlighting where they matched and where they differed. Based on these patterns, the researchers drew theoretical lines of descent between the genomes and got an idea as to which gene variants, or alleles, the common ancestors of these people likely carried. 

In addition to mapping out these genealogical relationships, the team approximated where in the world the common ancestors of the sequenced individuals lived. They estimated these locations based on the ages of the sampled genomes and the location where each genome was sampled.

"The way that we've estimated where ancestors live is, in particular, very preliminary," said first author Anthony Wilder Wohns, who was a doctoral student at the University of Oxford's Big Data Institute at the time of the study. Despite its limitations, the data still captured major events in human evolutionary history. For example, "we definitely see overwhelming evidence of the out-of-Africa event," meaning the initial dispersal of Homo sapiens from East Africa into Eurasia and beyond, said Wohns, who is now a postdoctoral researcher at the Broad Institute of MIT and Harvard.

The method the researchers used "works well to refine known ancestral locations and, as sampling improves, it has the potential to identify currently unknown human movements," Aida Andrés, an associate professor in the Genetics, Evolution and Environment Department at the University College London (UCL) Genetics Institute, and Jasmin Rees, a doctoral candidate at the UCL Genetics Institute, wrote in a commentary(opens in new tab), also published in the journal Science on Thursday. So, in the future, when more data become available, such analyses could potentially reveal chapters of human history that are currently unknown to us. 

To build a unified genealogy of humanity, the researchers first pooled genomic data from several large, publicly available data sets, including the 1000 Genomes Project, the Human Genome Diversity Project and the Simons Genome Diversity Project. From these data sets, they gathered about 3,600 high-quality genome sequences from modern-day humans; "high-quality" genome sequences are those with very few gaps or errors, which have been largely assembled in the correct order, according to a 2018 report in the journal Nature Biotechnology(opens in new tab). 

High-quality genomes from ancient humans were harder to come by, since DNA from ancient specimens tends to be severely degraded, Wohns said. However, in digging through previously published research, the team managed to find eight high-quality ancient hominin genomes to include in their tree. These included three Neanderthal genomes, one thought to be more than 100,000 years old; a Denisovan genome roughly 74,000 to 82,000 years old(opens in new tab); and four genomes from a nuclear family that lived in the Altai Mountains of Russia about 4,600 years ago. (Neanderthals and Denisovans are extinct relatives of Homo sapiens.)  

In addition to these high-quality ancient genomes, the team identified more than 3,500 additional, lower-quality genomes with significant degradation, ranging from a few hundred to several thousand years old, Wohns said. 

These degraded genomes did not factor into the main tree-building analysis, but the team sifted through the fragments to see which isolated alleles could be identified in the samples. This piecemeal data helped the researchers confirm when different alleles first cropped up in the genealogical record, since the specimens that the genomes came from had been radiocarbon dated.