By: LaKisha David and Leia Jones
Genetic Genealogy Information Type for Genetic Matching
Social science research on genetic genealogy typically covers associations between the results of a genetic genealogy test and racial-ethnic identity. This blog series is about the use of genetic genealogy to discover genetic matches (e.g., relatives), individuals sharing a minimum threshold of DNA with the participant tester indicating that the newly discovered individual is a potential relative sharing a common ancestral great grandparent within 10 generations (e.g., the eighth great-grandparent). The participant and the discovered genetic relative would be members of an ancestral family group such as a clan. In the context of our study of reunification after the Transatlantic Slave Trade, the names of the ancestral great grandparents are typically unknown with genetic genealogy being a tool that enables African and African diaspora ancestral family members (i.e., extra-extended relatives) to identify each other as genetic matches, relatives, or family. The difference between using genetic genealogy to identify individuals who are genetic matches for genealogical and kinship purposes rather than for the purposes of claiming an ancestral ethnic identity or membership is a key distinction in this study when selecting the type of genetic genealogy test and interpreting the social meanings associated with the results.
Advancements made in molecular biology and supercomputing propelled genealogy into the realm of genetics (Nelson & Robinson, 2014). The first genetic testing for genealogical purposes was made available in 2000 by Family Tree DNA. By 2010, there were 38 companies offering such services (Nelson & Robinson, 2014; Wagner, Cooper, Sterling, & Royal, 2012). AncestryDNA (of Ancestry.com) launched in 2012 and by July 2015, they tested 1 million people (Swayne, 2015). By April 2017, they tested 4 million people (Ancestry Team, 2017). It is no question that the use of direct-to-consumer (DTC) genetic genealogy testing is rapidly increasing. The type of tests that consumers purchase is associated with the type of information the tests provide and consumers’ motivation for taking the test (Nelson, 2016). The motivation to engage in genetic genealogy and the meanings associated with the results can be better understood by understanding the types of information that the tests provide.
Although companies vary in their methods and product offerings, genetic genealogy tests provide three types of information: maternal or paternal ethnic lineages, biogeographical ancestry (BGA) estimates, and genetic matches (Wagner et al., 2012).
The first type of information is maternal or paternal ethnic lineages (i.e., haplotypes). Lineage information enables geneticists to provide ancestral spatiotemporal information (i.e., geographical location and time) using mitochondrial DNA (mtDNA) and Y-chromosomal DNA (Y-DNA). These tests are used to provide African American consumers with African origin ethnic groups.
A limitation with offering to identify origin countries or ethnic groups is that common haplotypes are present in multiple ethnic groups due to migration within Africa (Ely, Wilson, Jackson, & Jackson, 2006). Results consisting of geographical information based on common haplotypes or poorly sampled populations are problematic (Shriver & Kittles, 2008). According to a study on the ability to identify the African origin country or ethnic group of African Americans’ ancestors based on their mtDNA haplotypes, only 5% of the African American sample’s maternal lineages exactly matched a single African ethnic group using a database representing West, West Central, South, Southeast, and East Africa (Ely et al., 2006; Ely, Wilson, Jackson, & Jackson, 2007). Additionally, 21% exactly matched 2 – 9 ethnic groups, 31% exactly matched over 9 different ethnic groups, and 43% exactly matched no ethnic group within the database (Ely et al., 2007). In other words, using research databases, a single African origin ethnic group often cannot be identified for African American maternal lineages; their maternal lineage more often matches multiple ethnic groups whose mtDNA were not distinctive from each other. Additionally, lineage information based on DNA represents only 1% of the genome (Shriver & Kittles, 2008) and so is not representative of the consumer’s ancestral relatedness.
Another limitation is that mtDNA and Y-chromosome testing is only somewhat useful in learning about even recent shared ancestors such as paternity (Shriver & Kittles, 2008). A practical example is the case of using Y-chromosome DNA tests to settle the dispute over President Thomas Jefferson’s paternity of enslaved Sally Hemings’s youngest son, Eston Hemings Jefferson, thought by some to be the biological child of Thomas Jefferson (Foster et al., 1998). Eston has a perfect match on the Y-chromosome with four of five male descendants of Thomas Jefferson’s paternal uncle (i.e., four of Thomas Jefferson’s male cousins), meaning that Eston’s paternal lineage is biologically related to the family of Thomas Jefferson (Foster et al., 1998; King et al., 2007) and share ancestral origins with the Jefferson paternal lineage among peoples of indigenous Europe, East Africa, or the Middle East (King et al., 2007). Eston’s Y-chromosome matching other Jefferson males means that based solely on the results of the Y-chromosome test and not using other information, Eston could have been fathered by Thomas Jefferson or Thomas Jefferson’s paternal male relatives. For the same reason, King and colleagues (2007) said that “[i]f we did not have prior knowledge about the ancestry of the Jefferson haplotype, we might assign it to an Egyptian origin” (King et al., 2007, p. 588). The point here is that mtDNA and Y-chromosome tests are very informative in certain aspects but are limited in determining genealogical relatedness for the purposes of our study. They should also be used with caution when determining ancient ancestry on small-scale geographical regions.
The second type of information provided by genetic genealogy is biogeographical ancestry (BGA) estimates which are typically referred to as ethnicity estimates by companies and the general population (Pfaff, Parra, & Shriver, 2000). In contrast to the very small percentage of ancestors represented in the lineage-based tests using mtDNA and Y-chromosome, BGA is representative of the tester’s ancestors from whom the tester has inherited DNA at specific locations along the 22 pairs of chromosomes (i.e., autosomes). Developed by biological anthropologist Mark Shriver and molecular biologist Tony Frudakis (Gannett, 2014), BGA “refers to the component of ethnicity that is biologically determined and can be estimated using genetic markers that have distinctive allele frequencies for the populations in question” (Pfaff et al., 2000). This refers to the use of alleles, variants in genes at specific locations along the 22 pairs of chromosomes, to estimate a tester’s ancestral continental population(s) (e.g., African, European, Native American) (Pfaff et al., 2000) or regional populations (e.g., Southern European, Northern European) (Shriver & Kittles, 2008) in some cases. Then the tester’s total proportional ancestry is estimated as admixture proportions (Pfaff et al., 2000). For example, one African American tester’s results could indicate that their total ancestry is 80% African, 15% European, and 5% Native American. However, there is variability in the admixture of even full siblings based on which parents’ gene variants just happened to have been inherited by the siblings. For example, one sibling may appear to inherit 80% African variants while another sibling may inherit 73% African variants. The percentages provided by companies should be understood to include a confidence interval and not be interpreted as an exact number (Shriver & Kittles, 2008).
Until 2009, studies that included enough positions along the genome to produce relatively high specificity in continental populations “included few African populations” (Bryc et al., 2009, p. 1). This limited the ability to provide more regional level population analysis for African populations. Bryc and colleagues (2009) conducted a study that consisted of African Americans, people of European descent, and several populations from West and South Africa. Using principal component analysis (PCA) and a clustering algorithm and assuming two source populations (i.e., African, non-African), they find that 77% of African Americans’ ancestors were from Africa (Bryc et al., 2009). This is similar to findings from other researchers who found that 73.2 % – 84.9 % of African American ancestry comes from sub-Saharan Africa. African American remaining admixture is 21.3 % – 24.0% European and 0.8 % – 2.8 % Native American (Baharian et al., 2016; Bryc, Durand, Macpherson, Reich, & Mountain, 2015). Bryc and colleagues (2009) also find that the African component of African American ancestry “is most similar to the profile from non-Bantu Niger-Kordofanian-speaking populations, which include the Igbo, Brong, and Yoruba, with FST values to African segments of the African Americans ranging from 0.074 to 0.089%” (Bryc et al., 2009, p. 5). There are very small differences between the African portion of African Americans’ ancestors and non-Bantu Niger-Kordofanian-speaking populations.
One limitation is that BGA estimates tell African American consumers generally what they already know, that most of their ancestors come from Africa. Additionally, companies offering African BGA estimates at regional levels offer country names and geopolitical borders for current day countries that were not used by people living in Africa 500 years ago. There is more genetic diversity in Africa than in the rest of the world. Evidence based on language, DNA, and geographical distributions indicate that people have dispersed across Africa tens of thousands of years before the Transatlantic Slave Trade. Because of this, the African diversity found in the African American genome today is not solely the result of African ethnic mixing in the U.S. during slavery (Ely, Wilson, Jackson, & Jackson, 2006; Jackson & Borgelin, 2010). Companies continuing to use geopolitical borders in their results will continue to provide information that is not as informative as they could be for people of African descent. Like lineage information, BGA is dependent on the quality of a reference database consisting of population-specific gene variants. Although some commercial companies, such as Ancestry.com, promote regional level specificity for Africa, BGA information based on a reference database does not enable researchers to identify relatives among a general population of testers.
This study makes use of the third type of information provided by genetic genealogy testing: genetic matches. By comparing the DNA profiles of each customer with every other customer within its own database, companies provide each customer with a list of persons with whom a certain amount of DNA is shared. These are genetic matches or potential relatives. Companies offering genetic matches use the amount of shared DNA measured in centiMorgans (cMs) to provide an estimate of the class of relatedness. For example, Ancestry.com uses the following class of relatedness and approximate amount of shared DNA: Parent/Child (3,475 cMs), Close Family (2,800 – 680 cMs), 2nd Cousin (620 – 200 cMs), 3rd Cousin (180 – 90 cMs), 4th Cousin (85 – 20 cMs), and Distant Cousin (20 – 6 cMs) (Ancestry, 2018). The most distant class of relatedness provided by Ancestry.com is in a section consisting of 5th – 8th cousins. The most common recent ancestors among matches within this group shares 4th – 7th great grandparents or have shared ancestors approximately 6 to 9 generations ago.
Companies provide matches from among their own customer database. However, having millions of customers, genetic matching results are not as limiting as using haplotype reference databases in other types of tests because consumers are bound to have at least one relative in the database (Henn et al., 2012; Ramstetter et al., 2017). The primary consideration for genetic matching is the accuracy of the genetic matching results, particularly when consumers go on to initiate contact with the genetically matching person identified in the results to learn about shared family history.
Commercial companies use modified versions of algorithms developed by academic population geneticists to provide relatedness information. For example, Ancestry.com base their genetic matching on BEAGLE (B. L. Browning, Zhou, & Browning, 2018; S. R. Browning & Browning, 2007) and GERMLINE (Gusev et al., 2009) within its procedures for providing genetic matches to consumers (Ball et al., 2016). Ramstetter and colleagues (2017) conducted a study based on a Mexican American sample consisting of 2,485 Mexican Americans with known pedigree information to evaluate the accuracy of 12 pairwise relatedness inference methods. They find that GERMLINE is one of the most accurate methods available (Ramstetter et al., 2017).
Ramstetter and colleagues’ (2017) study measured the ability of various methods accurately specifying the degree of relatedness of a selected pair from 1 to 7 degrees or being unrelated. A degree refers to the number of birthing events separating two people. For example, a parent and child have 1 degree of relatedness, full siblings have 2 degrees of relatedness, and third cousins sharing great-great grandparents have 8 degrees. Although their study focused on accuracy of various methods predicting degree of relatedness compared to reported degree of relatedness, we are reporting accuracy of the fact of relatedness from their study. For our study, we are concerned with the methods’ ability to determine if members of a dyad are related rather than how the members of the dyad are related. Based on Ramstetter and colleagues’ (2017) findings, GERMLINE determined fact of relatedness at accuracy levels of 100% (for known 1st degree relatives), 99.89 – 99.19% (2nd – 5th degree relatives), 98.53% (6th degree relatives), and 83.57% (7th degree relatives). For those reporting being unrelated, GERMLINE predicted that 80.58% were unrelated and 19.42% were 4th – 8th degree relatives. Ramstetter and colleagues (2017) states that this could attributed to false positives predicted by GERMLINE, but it could also be due to cryptic relatedness (i.e., relatives not knowing they are related). In other words, algorithms can accurately or fairly accurately determine if two people are related up to 6th and 7th degrees of relatedness.
This is significant because some genetic genealogy services used by the general population provide results on relatedness based on these scientific methods. For example, Ancestry.com uses an adapted version of GERMINE called J-GERMLINE that is designed to work with growing databases (Ball et al., 2016). The usefulness of results produced by algorithms that are based on detecting IBD segment sharing (e.g., GERMLINE, J-GERMLINE) relies on the use of DNA profiles that are already phased. Genotype phasing is the process of ordering allele assignments by parent across the SNP locations. Whereas statistical methods such as those used by Ancestry.com are available to infer phasing, comparing the child’s DNA profile to the parents’ DNA profiles is the best way to ensure phasing accuracy (Ball et al., 2016; S. R. Browning & Browning, 2007; Roach et al., 2010; Tewhey, Bansal, Torkamani, Topol, & Schork, 2011). “Leveraging parental information to phase genomes provides excellent accuracy” (Tewhey et al., 2011, p. 220). By first using family-based phasing to order the child’s allele assignments and to identify segments shared by both the child and a parent, this ensures that the DNA segment being compared to other profiles of unknown relatedness to the child is truly a segment that was inherited by the child from a parent (i.e., IBD). Algorithms provided by GEDmatch enables the general public to create phased genetic profiles and then to conduct IBD segment sharing matching with other users in their database. IBD segment sharing among two target child profiles and one of each of their parents (i.e., among two parent-child dyads) infers that the segment was inherited from a shared ancestor and that the two target persons (and the matching parents) are related.
Ancestry Team. (2017, April 27). AncestryDNA Reaches 4 Million Customers in DNA Database. Retrieved November 9, 2017, from Ancestry Blog website: https://blogs.ancestry.com/ancestry/2017/04/27/ancestrydna-reaches-4-million-customers-in-dna-database/
Ancestry. (2018). DNA Tests for Ethnicity & Genealogical DNA testing | AncestryDNATM. Retrieved from http://dna.ancestry.com/
Baharian, S., Barakatt, M., Gignoux, C. R., Shringarpure, S., Errington, J., Blot, W. J., Bustamante, C. D., Kenny, E. E., Williams, S. M., Aldrich, M.C., Gravel, S. (2016). The great migration and African-American genomic diversity. PLoS Genetics, 12(5), e1006059.
Ball, C. A., Barber, M. J., Byrnes, J., Carbonetto, P., Chahine, K. G., Curtis, R. E., … Wilmore, L. (2016, March 31). AncestryDNA Matching White Paper: Discovering genetic matches across a massive, expanding genetic database. Retrieved from http://dna.ancestry.com/resource/whitePaper/AncestryDNA-Matching-White-Paper.pdf
Browning, B. L., Zhou, Y., & Browning, S. R. (2018). A One-Penny Imputed Genome from Next-Generation Reference Panels. The American Journal of Human Genetics, 103(3), 338–348. https://doi.org/10.1016/j.ajhg.2018.07.015
Browning, S. R., & Browning, B. L. (2007). Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype Clustering. The American Journal of Human Genetics, 81(5), 1084–1097. https://doi.org/10.1086/521987
Bryc, K., Auton, A., Nelson, M. R., Oksenberg, J. R., Hauser, S. L., Williams, S., … Bustamante, C. D. (2009). Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proceedings of the National Academy of Sciences, 200909559. https://doi.org/10.1073/pnas.0909559107
Bryc, K., Durand, E., Macpherson, J., Reich, D., & Mountain, J. (2015). The Genetic Ancestry of African Americans, Latinos, and European Americans Across the United States. Biology, Chemistry, and Environmental Sciences Faculty Data Sets. Retrieved from http://digitalcommons.chapman.edu/sees_data/1
Ely, B., Wilson, J. L., Jackson, F., & Jackson, B. A. (2006). African-American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups. BMC Biology, 4(1), 34.
Ely, B., Wilson, J. L., Jackson, F., & Jackson, B. A. (2007). Correction: African American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups. BMC Biology, 5(1), 13.
Foster, E. A., Jobling, M. A., Taylor, P. G., Donnelly, P., De Knijff, P., Mieremet, R., … Tyler-Smith, C. (1998). Jefferson fathered slave’s last child. Nature, 396(6706), 27.
Gannett, L. (2014). Biogeographical ancestry and race. Studies in History and Philosophy of Biological and Biomedical Sciences, 47, 173–184.
Gusev, A., Lowe, J. K., Stoffel, M., Daly, M. J., Altshuler, D., Breslow, J. L., Friedman, J. M., Pe’er, I. (2009). Whole population, genome-wide mapping of hidden relatedness. Genome Research, 19(2), 318–326. https://doi.org/10.1101/gr.081398.108
Henn, B. M., Hon, L., Macpherson, J. M., Eriksson, N., Saxonov, S., Pe’er, I., & Mountain, J. L. (2012). Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples. PLoS ONE, 7(4). https://doi.org/10.1371/journal.pone.0034267
Jackson, F. L. C., & Borgelin, L. F. J. (2010). How Genetics Can Provide Detail to the Transatlantic African Diaspora. In T. Olaniyan & J. H. Sweet (Eds.), The African Diaspora and the Disciplines. Bloomington: Indiana University Press.
King, T. E., Bowden, G. R., Balaresque, P. L., Adams, S. M., Shanks, M. E., & Jobling, M. A. (2007). Thomas Jefferson’s Y chromosome belongs to a rare European lineage. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists, 132(4), 584–589.
Nelson, A. (2016). The social life of DNA: Race, reparations, and reconciliation after the genome. Boston, MA: Beacon Press.
Nelson, A., & Robinson, J. H. (2014). The Social Life of DTC Genetics: The case of 23andMe. In D. L. Kleinman & K. Moore (Eds.), Routledge Handbook of Science, Technology, and Society (pp. 108–123). https://doi.org/10.4324/9780203101827.ch6
Pfaff, C. L., Parra, E. J., & Shriver, M. D. (2000). Genetic estimation of biogeographical ancestry. Presented at the American Society of Human Genetics Annual Meeting. Retrieved from http://www.ashg.org/genetics/abstracts/abs00/f1195.htm
Ramstetter, M. D., Dyer, T. D., Lehman, D. M., Curran, J. E., Duggirala, R., Blangero, J., Mezey, J. G., Williams, A. L. (2017). Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives. Genetics, 207(1), 75–82. https://doi.org/10.1534/genetics.117.1122
Roach, J. C., Glusman, G., Smit, A. F., Huff, C. D., Hubley, R., Shannon, P. T., Rowen, L., Pant, K. P., Goodman, N., Bamshad, M., Shendure, J., Drmanac, R., Jorde, L. B., Hood, L., Galas, D. J. (2010). Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science, 328(5978), 636–639.
Shriver, M. D., & Kittles, R. A. (2008). Genetic ancestry and the search for personalized genetic histories. In B. A. Koenig, S. S.-J. Lee, & S. S. Richardson (Eds.), Revisiting Race in a Genomic Age (pp. 201–214). New Brunswick, New Jersey, and London: Rutgers University Press.
Swayne, A. (2015, July 16). AncestryDNA Celebrates One Million People Tested. Retrieved September 29, 2015, from Ancestry Blog website: http://blogs.ancestry.com/ancestry/2015/07/16/ancestrydna-celebrates-one-million-people-tested/
Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J., & Schork, N. J. (2011). The importance of phase information for human genomics. Nature Reviews Genetics, 12(3), 215–223. https://doi.org/10.1038/nrg2950
Wagner, J. K., Cooper, J. D., Sterling, R., & Royal, C. D. (2012). Tilting at windmills no longer: A data-driven discussion of DTC DNA ancestry tests. Genetics in Medicine, 14(6), 586.