Genetic Genealogical Methods Used to Identify Diaspora Relatives of Members of the Kassena Ethnic Group of Northern Ghana
Genetic Genealogical Methods Used to Identify Diaspora Relatives of Members of the Kassena Ethnic Group in Northern Ghana
LaKisha David *1, Leia Jones 2 , Judith Escamilla3
*1 firstname.lastname@example.org, CEO, The African Kinship Reunion; PhD Candidate, Human Development and Family Studies, University of Illinois at Urbana-Champaign
2 email@example.com, Undergraduate, Departments of Molecular and Cellular Biology, University of Illinois at Urbana-Champaign
3 firstname.lastname@example.org, Undergraduate, Departments of Molecular and Cellular Biology, University of Illinois at Urbana-Champaign
Abstract: Determining relatedness between African and African American individuals challenges the common narrative that African American separation from Africa was too long ago for family history or relatives to be discovered. Due to the historic significance of families separated during the Transatlantic Slave Trade being able to reunite, the objective of this study was to explore the genetic relatedness between Ghanaians and African Americans using publicly available tools. We used the DNA profiles provided by a commercial company for 32 individuals among the Kassena people of northern Ghana. We phased our sample using family-based phasing and identified GEDmatch profiles who matched our participant profiles at a minimum of 7 cM on a single segment and matched another matching GEDmatch profile at a minimum of 3,400 cM. Participant dyads matched 52 GEDmatch dyads sharing a range of 7.1 cM to 12.0 cM on a single segment (and total shared) with participant dyads. By inference, families separated during the Transatlantic Slave Trade may be able to reunite as ancestral family groups using publicly available genetic genealogy tools.
Keywords: genetic genealogy, reunification, ancestral family, autosomal DNA, African families, African American families
Genetic Genealogical Methods Used to Identify Diaspora Relatives of Members of the Kassena Ethnic Group in Northern Ghana
Between 1501 and 1900, 10.7 million people were taken from families and communities in Africa and sent to various diasporic locations in the world as part of the Transatlantic Slave Trade (Voyages Database, n.d.). Whereas details such as the names of ancestors in Africa before capture in the Transatlantic Slave Trade is still mostly missing, people of African descent such as African Americans living in the U.S. are identifying genetic matches among Africans using commercially available autosomal genetic genealogy tests. Based on major genetic genealogy company practices of providing results for genetic matches sharing a minimum of 6 cM (Ball et al., 2016), the African American and African pair of genetic relatives would be separated by up to a mean of 17 meiosis or approximately 9 generations (Thompson, 2013). The historical significance of this genetic matching is that it implies that families that were separated during the Transatlantic Slave Trade are reuniting. The emphasis here is not ancestral ethnic group identification or ancestral national identification but on the ability of African and African American related pairs within 10 generations of shared ancestors to identify and communicate with each other. This challenges the common narrative that African American separation from Africa was too long ago for family history or relatives to be discovered, but it also presents new questions about the meaning of family. Thus, the objective of this study is to explore the genetic relatedness between Ghanaians and African Americans using publicly available tools.
The primary consideration for our exploration of relatedness is accurately identifying relatives using autosomal DNA testing. The use of mitochondrial-DNA (mtDNA) and Y-chromosome DNA (Y-DNA) testing would have been only somewhat useful in exploring ancestry (Shriver & Kittles, 2008). Similarly, autosomal admixture research used in other informative studies (Baharian et al., 2016) is not suitable for characterizing genetic relatedness between two individuals at the level available with identity-by-descent (IBD) segment sharing. Commercially available autosomal genetic genealogy testing based on IBD segment matching is a viable option for research because these services provide access to large databases with millions of tested consumers. Consumers are bound to have at least one relative in databases of these sizes (Erlich et al., 2018; Henn et al., 2012; Ramstetter et al., 2017).
Along with commercially available genetic genealogy testing services, consumers have access to GEDmatch, a suite of online tools that allows consumers to analyze their own autosomal DNA data and to compare their DNA profiles with the profiles of other consumers (Thomson et al., 2020). GEDmatch hosts about 1 million profiles (Erlich et al., 2018). Algorithms provided by GEDmatch enables the general public to create family-based phased genetic profiles and then to use those profiles in comparison tests.
Although computational methods are improving, family-based phasing is a more accurate way to align alleles by biological parents (Ball et al., 2016; Browning & Browning, 2007; Roach et al., 2010; Tewhey et al., 2011). Additionally, family-based phasing also enables the person of African descent (i.e., GEDmatch profiles) to learn more about their genetic family history than with the use of unphased data. For example, with phased data, a person of African descent could learn that they, through their mother, are related to a person born in Ghana through the Ghanaian person’s father. This additional information discovered using phased data is of value to people of African descent testing to learn about their relatedness with Africans.
In this study, we demonstrate the use of publicly accessible genetic genealogy tools to determine genetic relatedness between Ghanaians and people of African descent within the GEDmatch database. This method demonstration is part of a larger study exploring family identity among Ghanaians who are interacting with their diaspora African American relatives. This study can inform future research exploring the claim that families that were separated during the Transatlantic Slave Trade can reunite using publicly available genetic genealogy tools.
Material and Methods
This project was approved by the Institutional Review Board (IRB) of both the University of Illinois at Urbana Champaign (UIUC) in the U.S. and the Nvrongo Health Research Centre (NHRC) in Ghana. For UIUC, this project is the protocol number 18608 titled Family Reunification Processes of Kassena Ethnic Group Members with African American Relatives Identified Through Autosomal DNA Testing Services. It was approved on May 10, 2018 with an expiration date of May 09, 2021, and a risk determination of no more than minimal risk. For NHRC, this project is under the ethics approval ID NHRCIRB304 titled Family Reunification of Kassena Ethnic Group Members with African American Relatives Identified through Autosomal DNA Testing Services. It was approved on May 22, 2018.
Gathering informed consent was a two-day process in which we explained the project to the community on one day and then reviewed the project and consent forms with the selected project participants on the next day. The project was explained in both the local language of Kasem and in English. We explained the purpose and goals of the research project, the basic science of DNA inheritance, the intended uses of the DNA samples, the commercial services and terms of conditions of AncestryDNA and the GEDmatch, how to navigate the AncestryDNA website, how to interpret the results, and the potential risks from the results.
We emphasized that participation in this project was voluntary and that participants can stop their project involvement at any time. Participants maintained their full customer privileges with their genetic profiles with AncestryDNA throughout the project. We explained various DNA Matches considerations such as:
“(1) If you are related to another person who is participating in this project, there is a good chance that the person’s name will appear in your list of DNA Matches and that your name will appear in their list of DNA Matches. You would have access to each other’s public profile (but not their private profile). (2) The more closely you and another person are related (such as father and son), the more likely that the person will show up in the list of DNA Matches. If you think that you would be troubled with this information becoming public, please do NOT consent to participate in this project…”
The intent was to explain the process thoroughly so that participants could truly provide informed consent.
There are 32 participants consisting of 4 unique parent-offspring dyads (n = 8) and 8 unique parent-parent-offspring trios (n = 24). One potential participant was dropped because they were a sibling of an offspring already in the sample. All participants are at least 18 years of age and are self-identified as members of the Kassena ethnic group residing in Paga, Ghana. Parents are 9 females and 11 males with a birth year range of 1939 to 1961. Offspring are 4 females and 8 males with a birth year range of 1980 to 2000. In Table 1, participants in dyads only are dyad numbers 7, 10, 11, and 16.
Residents of the Nania village are keepers of the Pikworo Slave Camp, a former slave camp used primarily during the 1500s to 1800s in both local and international slave trades (Schramm, 2007, 2011). It is currently used for memorial practices. Pikworo guides offer official tours to visitors from around the world who come to the site to learn about the local history of enslavement. The guides live in the Nania neighborhood. The Ghana Tourist Board (GTB) and the Ghana Museums and Monuments Board (GMMMB) recognize Pikworo as a pilgrimage attraction for the African diaspora (e.g., African Americans). Although elders in nearby villages still tell narratives of the local slave trade, the guided tour has an emphasis on the site’s connection to people taken to the dungeons along the southern coast and eventually to diasporic locations through the Transatlantic Slave Trade (Schramm, 2007, 2011). Whereas northern Ghana is associated with being a major source of captives for local and international slavery (Holsey, 2008), the Nania village was selected because of residents’ willingness and practices of discussing their community’s history with slavery prior to their recruitment in this study (Schramm, 2007, 2011). We make no claims about the expected degree of relatedness or the number of relatives between these residents of Ghana and the historical African diaspora. This was a starting point in our exploration of genetic relatedness between Africans and African Americans among an African population known to have been in Ghana for multiple generations and have been involved in slave trades (Holsey, 2008; Schramm, 2007, 2011).
Demographic data provided by participants
|Dyad||Person 1 (offspring)||Sex||Birth Year||Person 2 (parent)||Sex||Birth Year|
Paga, Ghana is a town that borders the country of Burkina Faso. It is the capital of the Kassena-Nankana West district. The main ethnic groups of the region are Mole-Dagbon, Grusi (of which Kassena is a subgroup), Mande-Busanga, and Gruma. It has a patrilineal system of inheritance. As of 2010, the population of the Kassena-Nankana West district was 70,667 individuals, 6.8% of the regional population. The median and the average age was 20 and 26, respectively. There were 96.7 males per 100 females (Ghana Statistical Service, 2012). In the Kassena-Nankana West district, 14.0% of the population lives in urban areas. The average household size is 5.0 for urban areas and 5.6 for rural areas. Among those 11 years and older, 47.8% are literate in English (some of whom are also literate in a Ghanaian language and/or French) and 49.8% are not literate for any language. Among those 15 years and older, 72.2% are employed (Ghana Statistical Service, 2012).
Saliva Sample Collection
Saliva samples were collected in June – July 2016, 2018, and 2019 using Ancestry collection tubes (Ancestry, 2018). Participants who tested in 2016 (n = 3) were randomly selected by residents of the neighborhood. Their adult offspring (n = 4) were purposively selected in 2018 based on our need to have parent-offspring dyads in the study. The remaining participants who tested in 2018 and (n = 13) and 2019 (n = 12) were randomly selected from a list of potential participants created by one resident of the neighborhood who was not a participant of the study, and then their parents or offspring were purposively selected. Potential participants were listed based on their willingness to have both parent and adult offspring participate in the study. Saliva samples were collected in public group gatherings in 2016 and 2018 at the project site located at the former Pikworo Slave Camp in Paga, Ghana, and in Tamale, Ghana in 2019.
Overview of Project Procedures
In June 2018, we organized a community event to build rapport and to explain the project to the community. Because the project was developed in consultation with a community member, the emphasis was on being transparent and building rapport. The next day, selected participants gathered at the project site. Members of the research team explained the project, provided time to answer questions participants may have had, and gathered informed consent. We then gathered the saliva samples. In July 2018, we conducted a round of focus group discussions about family meanings and the diaspora, provided a communal project mobile phone for use by participants to communicate with genetic matches, returned to the U.S. with the saliva samples, and then sent the saliva samples to Ancestry for processing. After the DNA was genotyped, we downloaded the profiles from Ancestry and uploaded them to GEDmatch, identified relatives within the GEDmatch database, provided the diaspora genetic matches with the contact information of their genetic matches in Paga, and provided the project coordinator selected from the community in Paga with the email address of the newly discovered diaspora relative. In March 2019, we collected a second round of focus group discussions about family meanings and the diaspora and analyzed the data using inductive and deductive thematic analyses. We gathered additional saliva samples in June 2019 to expand from dyad to trio data where possible.
Procedures for Identifying Genetic Matches in GEDmatch
An essential task was to determine which persons of African descent within the GEDmatch database were related to the participants residing in Ghana, supporting the claim that families that were separated during the Transatlantic Slave Trade can reunite using autosomal genetic genealogy. We used several tools provided by GEDmatch (Software Version May 22 2020, Build 37) to identify and contact genetic matches within their database.
Step one was to obtain genetic information in a text data file for each participant, for which we used Ancestry (Ancestry, 2018). These profiles contained the Reference SNP cluster ID (rsID), chromosome and position of allele, and the unphased values of the allele pair for up to approximately 700,000 SNPs.
Step two was to upload the DNA profiles to GEDmatch and create phased profiles for each participant parent-offspring dyad. We used a one-to-one comparison between participant parent and participant offspring before creating the phased datafiles to assess that each participant dyad consisted of a biological parent and offspring. Members of dyads shared more than 3,400 cMs total (with a minimum segment size of 7 cM) indicating that each dyad consisted of a biological parent-offspring pair (see Table 2). To create the phased datafiles, we used the GEDmatch ’ Phasing tool which compares the DNA profiles of the adult offspring participant with their parents’ DNA profiles for allele alignment. This created two phased profiles. One phased datafile (denoted with “M1”) consists of the DNA segments shared between the adult child and the biological mother. The other phased datafile (denoted with “P1”) consists of the DNA segments shared between the adult child and the biological father. For the dyads, we deleted the phased profile that was created for the non-participating parent.
Step three was to identify potential genetic matches within the GEDmatch database. To do this, we used their one-to-many comparison tool for each of our parent-offspring phased participant profiles. This provided a list of unphased GEDmatch profiles that shared at least 7 cMs in common with our parent-child phased participant profiles.
Step four was to identify parent-offspring dyads among the GEDmatch profiles matching our participant phased profiles so that we could assess IBD segment sharing for the GEDmatch dyad and thereby further reduce the chances of false-positive matches. To identify parent-offspring GEDmatch dyads, we used a GEDmatch visualization tool—their 3D chromosome browser. The resultant matrix is a comparison of selected GEDmatch profiles with each other. We selected GEDmatch profiles that shared at least 3,400 cMs with another GEDmatch profile, indicating a parent-offspring dyad, identical twins, or a duplicate profile.
Step five was to determine if the GEDmatch dyads also match another participant dyad on an overlapping segment using GEDmatch’s 2D chromosome browser. For participant trios, we also used the one-to-one comparison tool to assess if the GEDmatch dyad also matched the participant’s other parent’s phased profile. The one-to-one comparison tool provides the chromosome, start and end location on the genome, amount of shared DNA in cM, and the number of shared SNPs for each segment shared by the two profiles in the comparison.
Step six was to contact the manager of the GEDmatch profiles by the emails provided in the GEDmatch database. For each initial contact, we provided information about the participant dyad, the project, and the contact information for the project phone with the research team member in Ghana. That team member in Ghana was copied on each email. For newly identified genetic matches who replied, we confirmed that their GEDmatch dyad was indeed for a parent and offspring. These newly discovered persons were recorded as being related to the Ghanaian participant parent and adult offspring, sharing a common ancestor within 10 generations.
Criteria of Relatedness for this Study
Our threshold was set to a minimum of 7 cM on a single segment. For our criteria, we drew from work on cryptic distant relatives (Henn et al., 2012). In their work, Henn and colleagues used a minimum of 7 cM for a segment in determining relatedness based on their ability to find offspring-unknown and parent-unknown segments where the offspring also matched the parent (i.e., IBD for the parent and offspring) for 90% of the segments (Henn et al., 2012). Our focus was on identifying the fact of relatedness (e.g., are they related or not?) rather than distinguishing the degree of relatedness (e.g., are they 4th cousins or 5th cousins?) and we used “the length of DNA segments that are consistent with identity by descent (IBD) from a common ancestor” measured in centiMorgans (cM) (Browning & Browning, 2007; Gusev et al., 2009; Henn et al., 2012, p. 1). Matching is based on the similarity of autosomal single nucleotide polymorphisms (SNPs).
All participant parent and participant offspring dyads matched at a range of 3,543.5 cM to 3,561.2 cM across 496,223 to 524,198 SNPs in 47 to 52 segments (see Table 2).
Parent-progeny comparisons using GEDmatch
|Dyad||Largest Single Segment (cM)||Total Half-Match segments (HIR) (cMs)||HIR (percent)||Number of Segments||Number of SNPs Used in Comparison||Full Identical (percent)|
There were 52 GEDmatch dyads identified (see Table 3). Some GEDmatch dyads consisted of GEDmatch profiles that were in multiple dyads forming a cluster of several GEDmatch profiles all matching each other on an overlapping segment. Those paired at a minimum of 3,400 cM were counted as a GEDmatch dyad as per the study criteria. GEDmatch dyads shared a range of 7.1 cM to 12.0 cM on a single segment (and total shared) with participant dyads. GEDmatch dyads matched only one participant parent and not both participant parents in trio data. GEDmatch dyads also did not match multiple participating dyads.
GEDmatch potential parent-progeny profiles matching participant phased profiles
|Participant Dyad Number1, 3||GEDmatch Dyad Number2||Total DNA Shared with Participant Dyad (cM)||Number of Segments||Match Other Parent’s Phased Profile?||Number of Other Matching Participant Dyads|
1Participant dyad numbers correspond to dyad numbers in Tables 1 and 2. 2GEDmatch dyad numbers are not the identifiers used within GEDmatch. 3If the participant dyad number is not in the table, that dyad did not have GEDmatch profiles that met our criteria.
This study presents the methods used to identify relatives of Ghanaians among the African diaspora in our larger study on the meanings of family among Ghanaians. We found evidence supporting the claim that residents of Paga, Ghana share common ancestors with GEDmatch database members within the last 10 generations using publicly available genetic genealogy tools. Additional research should be conducted to examine this relatedness further.
When using GEDmatch, there are several considerations for determining individual-level relatedness between Africans and African Americans. Using GEDmatch’s one-to-many comparison, we generated a list of consumers within the GEDmatch database that matched our participant phased profiles at a minimum of 7 cM on a single segment. Whereas other algorithms can accurately identify matches at a minimum of 3 cM (Ramstetter et al., 2017), this option was not available with the one-to-many comparison tool. It is highly likely that there are more true matches in GEDmatch than what we identified in this study that match between 3 cM and 7 cM on a single segment.
Similarly, using the 3D chromosome browser, we selected only those GEDmatch profiles that matched another GEDmatch profile at a minimum total of 3,400 cM. Our conservative criteria disqualified several GEDmatch profiles that were potentially true matches sharing a total of between 7 cM and 3,400 cM but who did not have a parent or offspring profile within the GEDmatch database. Additionally, the profiles that appear in the results are unphased profiles, which should be considered when making conclusions about relatedness.
After identifying matching parent-offspring dyads within the GEDmatch database, additional steps must be taken to learn more about the GEDmatch dyad’s ancestral history. GEDmatch provides admixture tools and the contact information of the GEDmatch profile using the information provided by the profile manager. The profile manager would need to confirm that the discovered GEDmatch parent-offspring dyad is in fact a parent and offspring and are people of African descent.
This methodological paper demonstrated the use of publicly accessible genetic genealogy tools to identify genetic matches within the GEDmatch database of Ghanaian residents. The results of this study were used in a larger study about family identity among members of the Kassena ethnic group who engaged in social interactions with their African American ancestral relatives.
We thank the College of Agricultural, Consumer and Environmental Sciences (ACES) at the University of Illinois at Urbana Champaign (UIUC) for providing the ACES International Graduate Grant to fund this study.
Ancestry. (2018). DNA Tests for Ethnicity & Genealogical DNA testing | AncestryDNATM. http://dna.ancestry.com/
Baharian, S., Barakatt, M., Gignoux, C. R., Shringarpure, S., Errington, J., Blot, W. J., Bustamante, C. D., Kenny, E. E., Williams, S. M., & Aldrich, M. C. (2016). The great migration and African-American genomic diversity. PLoS Genetics, 12(5), e1006059.
Ball, C. A., Barber, M. J., Byrnes, J., Carbonetto, P., Chahine, K. G., Curtis, R. E., Granka, J. M., Han, E., Hong, E. L., Kermany, A. R., Myres, N. M., Noto, K., Qi, J., Rand, K., Wang, Y., & Wilmore, L. (2016). AncestryDNA Matching White Paper: Discovering genetic matches across a massive, expanding genetic database. http://dna.ancestry.com/resource/whitePaper/AncestryDNA-Matching-White-Paper.pdf
Browning, S. R., & Browning, B. L. (2007). Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype Clustering. The American Journal of Human Genetics, 81(5), 1084–1097. https://doi.org/10.1086/521987
Erlich, Y., Shor, T., Pe’er, I., & Carmi, S. (2018). Identity inference of genomic data using long-range familial searches. Science, 362(6415), 690–694.
Ghana Statistical Service. (2012). 2010 Population & Housing Census. Summary Report of Final Results.
Gusev, A., Lowe, J. K., Stoffel, M., Daly, M. J., Altshuler, D., Breslow, J. L., Friedman, J. M., & Pe’er, I. (2009). Whole population, genome-wide mapping of hidden relatedness. Genome Research, 19(2), 318–326. https://doi.org/10.1101/gr.081398.108
Henn, B. M., Hon, L., Macpherson, J. M., Eriksson, N., Saxonov, S., Pe’er, I., & Mountain, J. L. (2012). Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples. PLoS ONE, 7(4). https://doi.org/10.1371/journal.pone.0034267
Holsey, B. (2008). Routes of Remembrance: Refashioning the Slave Trade in Ghana. University of Chicago Press.
Ramstetter, M. D., Dyer, T. D., Lehman, D. M., Curran, J. E., Duggirala, R., Blangero, J., Mezey, J. G., & Williams, A. L. (2017). Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives. Genetics, 207(1), 75–82. https://doi.org/10.1534/genetics.117.1122
Roach, J. C., Glusman, G., Smit, A. F., Huff, C. D., Hubley, R., Shannon, P. T., Rowen, L., Pant, K. P., Goodman, N., & Bamshad, M. (2010). Analysis of genetic inheritance in a family quartet by whole-genome sequencing [phasing]. Science, 328(5978), 636–639.
Schramm, K. (2007). Slave route projects: Tracing the heritage of slavery in Ghana. Reclaiming Heritage. Alternative Imaginaries of Memory in West Africa, 71–98.
Schramm, K. (2011). The slaves of Pikworo: Local histories, transatlantic perspectives. History & Memory, 23(1), 96–130.
Shriver, M. D., & Kittles, R. A. (2008). Genetic ancestry and the search for personalized genetic histories. In B. A. Koenig, S. S.-J. Lee, & S. S. Richardson (Eds.), Revisiting Race in a Genomic Age (pp. 201–214). Rutgers University Press.
Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J., & Schork, N. J. (2011). The importance of phase information for human genomics. Nature Reviews Genetics, 12(3), 215–223. https://doi.org/10.1038/nrg2950
Thompson, E. A. (2013). Identity by descent: Variation in meiosis, across genomes, and in populations. Genetics, 194(2), 301–326.
Thomson, J., Clayton, T., Cleary, J., Gleeson, M., Kennett, D., Leonard, M., & Rutherford, D. (2020). An empirical investigation into the effectiveness of genetic genealogy to identify individuals in the UK. Forensic Science International: Genetics, 46, 102263. https://doi.org/10.1016/j.fsigen.2020.102263
Voyages Database. (n.d.). Voyages: The Trans-Atlantic Slave Trade Database. Retrieved May 23, 2020, from https://www.slavevoyages.org/voyage/database#statistics