Parsi DNA

for the Zoroastrian Persian population from the Indian Subcontinent

Currently just a place to collect information for our benefit as we try to explain the special situation about this population. May evolve into a more formal study at some point.

We created this page to start documenting information found on the Parsi population as it relates to DNA studies. Mostly from a genealogical perspective. Many Parsi's have tested using the consumer level genetic genealogy tests available. But they are confused by the results compared to what the testing companies promote. Testing company information is based on traditional, published literature for the European-descent community with very low endogamy. But much of the world is not so thoroughly endogamous in their DNA. There are isolated populations where close cousin marriages were practiced and even encouraged. The traditional royal families in Europe come to mind immediately. As do the Mennonites and Amish in the USA, Ashkenazi Jews, Acadians and similar. Oddly, when many of these populations are mentioned, Parsi's never enter into the picture. So little has been studied or understood Yet, based on our limited experience, they are likely the most strongly endogamous (both near and far timeline) population group that exist. Hence, we are looking to collect information about any experience and guidelines available to provide assistance to Parsi's in interpreting their genetic genealogy test results.

Note 1: An interesting find in 2017 with triangulated groups. Specifically, using the GEDMatch Tier 1 Triangulated Group tool. For Europeans they have tens to a hundred or so triangulated groups when running the tool with default parameters. For Parsi's, it is just a few. So it appears that although there is this 1-2% floor of matching DNA appearing, it does not carry through to common matching segments between 3 or more people. We have yet to (a) study the GEDMatch tool in more detail and (b) more fully study the statistics of the size of population and matches. But this looks promising as a method to weed out the match list for Parsi's to find more likely relatives. Does require more than 2 testers to compare and match though.

Tester	Company	# of Matches	First 20 Total Avg	First 20 #Segs Avg	First 20 Longest Avg	# Triang Groups
JD	23andMe
RM	23andMe	839	(1.65 to 1.24%)	(11 to 13)

Notes: First 20 does not include known relatives. Tester results transferred in are marked with the source test company (A,M,T,H as used by GEDMatch)

Note 2: In a Facebook group post in 2019, Leah LaPerle Larkin mentions a metric she has developed for the endogamous Acadians that she studies. Specifically, that endogamous false matches appear to have low average segment lengths. The average segment length can be easily determined; even on Ancestry where segments are not reported. Simply divide the total matching segment amount by the number of matching segments reported. (note: this is tougher to do on GEDMatch because they do not report the number of segments!) A number at 18 and higher is a match to look into. 15 and lower likely to be set aside initially. It is not clear if this same technique will apply to the Parsi community yet. Or how the test company and matching database source affects this. (For example, FTDNA and MyHeritage include segments as small as 1 cM in their total amount although they only claim to use as small as 5cM.)

Note 3: In Sep 2017, we discovered 43 kits uploaded to GEDMatch Genesis that are all labeled simply "Parsi" and appear, in their entirety, in other Parsi's match lists. We are trying to determine the source and if a study is already in progress. More importantly, are these made-up, false or real person kits. These kits disappeared after a few months.

Introduction

Parsi's were always believed a highly endogamous population; historically and to the present day. atDNA testing is confirming that. For the Parsi DNA kits we manage, we have seen a noise floor of 1 to 1.5% in matching amount where pretty much any Parsi who tests is showing that match strength to the other Parsi's. Only 2nd cousins or closer will exceed that noise floor. But confirmed relatives can often fall in the middle and below this noise level. As a result, atDNA testing has not been very helpful for members of this community. Some sites, that do not scrub their segments or limit their match lengths, show 2-3%.

For non-endogamous Europeans, the noise floor is below 15cM or roughly .2%. With 7cM being a key metric for segment length where below that length the matching segment is more likely to be a false-positive match than a relative with a real DNA match. There is an assumption of using at least 500 SNPs to determine a segment of that length also (the density of SNP testing is a crucial factor as well). We have as yet to get a good metric, but it does appear Parsi's have longer matching segments that are false. On the order of 20cM. So this floor of a minimum segment length to represent a reliable match may also be raised.

How do we know the matches are all Parsi's? While a few surnames overlap with the Indian dispora, most Parsi surnames are quite unique to their population. See the section below on Parsi surnames. But even without the surname (we have an adoption case we have worked on), the detection of a Parsi tester and matches is pretty evident when it appears.

Autosomal (and X) Segment Matching

We personally started testing on 23andMe exclusively. And immediately saw a difference there. The European-descent population tested see a 0.1% or better DNA match with just 1 or 2 segments larger than 7cM on 1 or 2 chromosomes; and can find they are distantly related. This is barely above the noise floor for the testing process, On our sister project, H600, we have been able to dig deep to find some matching with likely relatives that are 5th to 7th cousins; beyond the suspected limits. On the other hand, the Parsi community is showing a 1.5 to 2% total match strength, with tens of segments over 7cM, across 8 to 12 chromosomes (on average) for people that show no real relation (that can be discovered) in a genealogical time. The longest matching segment noise floor seems to be 20cM also (whereas the testing process and studies indicate a 7cM longest segment floor, in general). (For help, roughly 70cM in total matching segments is about 1%.)

So the question becomes: how do you interpret the Parsi community test results on Autosomal and X Chromosome SNP analysis? Is there a new noise floor to set for this community? Do we have to look for ONLY longer segments more than 20cM in length over multiple chromosomes? Or should we simply discount / throw out certain segments where larger matching segments tend to be recreated? Can any useful information be extracted from autosomal DNA testing? How has work with communities with similar historical endogamy handled this (like Ashkenazi Jews)? Studies of endogamous populations show traits such as lengthy Runs of Homoozygous (RoH) base-pairs — an indicator of recent and ancient mixing. This may be needed to distinguish segments between nearer term relatives.

I should add that Pedigree Collapse (related parents) analysis shows minimal relatedness. At least when measured using Immanuel's y-str.org tool or on GEDMatch's "Are your parents related" tools. Very few rise to 0.5% or higher. And there are no Full-Identical match regions; only Half-Identical. So the strong endogamy is not contributing there (which is odd in itself as we would expect this to occur as well). But, we surmise that 1-2% is a small amount and the likelihood of this overlap by chance is small.

Likely, should really work towards starting a yDNA and Autosomal project for the community as a whole. mtDNA is not as helpful just because there are not enough data points as Parsi men have always been able to marry a non-Parsi and the family (children and spouse) are still considered Parsi. But this does not apply to Parsi women. If they marry a non-Parsi man, they are not longer accepted in the temple. A study performed on mtDNA and referenced below seems to confirm the practice that outside women are accepted in. But maybe such a study on just the Dastur surname line is possible; similar to how the Cohen's were studied in the Jewish community before. Those with the Dastur surname are not allowed to marry outside the community IF they want to retain their Priestly status and thus surname.

Y Haplogroups (Patriline)

Interesting to note that our single Parsi Priest (Dastur) tester is determined to be L-M22 from the 23andMe general test. This is the predominant Haplogroup for Indian Zoraastrian priests found with the study published in 2017 (see ref below). The general Zorastrian population tested in India, with no priests included, resided in Haplogroup J. As did both Iranian Zoroastrians (both priests and lay). Haplogroup L is simply not found among Zoroastrians in Iran. Haplogroup L is more associated with the early human population development of the Indian Subcontinent.

Mitochondrial Haplogroups (Matriline)

Surnames

So what about Parsi surnames are so unique? Interesting link / first blush on explaining some Parsi surnames as being locative. This is one of the three main sources of surnames indicated by the Guild of One-Name Studies (GoONS):

Locative : derives from the place where someone came from or lived
- Toponymic : derived from a place name
Occupational or Metronymic : derives from the occupation of the bearer
Post Holder

As soon as you see the surnames Engineer, Doctor, Contractor, Captain, Driver and similar in a match list; then we know a Parsi match is involved. Surnames became quickly adopted by Parsis when the English rule started. And hence the English forms. Dastur is likely an example of a Post Holder as Dasturji means Priest.

A special note must be given for surnames ending in "wala". This is more common among the Parsi's in Gujurat — the origin of their culture in India. "Wala" is simply a way of specifying the occupation of the surname holder. Unlike the English surname forms given earlier, these are in the native tongue of Gujurati or even Persian. Examples are Campwala, Khariwala, Daroowala, Todiwala, Limbuwala, Botliwala, Bottlewala, Unwala, Pitavala and so on. Mistry, Modi, and Mehta are special ones we should mention. Cross-overs and common in the general Indian population as well.

Given names are often (ancient) Persian and have thus become used in some Islamic communities; especially those established early on in India. Feroz, Kersas, Farhad, Cyrus, Roshan, and Darius just to name a few.

References

For further information on DNA studies of the Parsi and related regions

Parsi direct

López, Saioa, et al, "The Genetic Legacy of Zoroastrianism in in Iran and India: Insights into Population Structure, Gene Flow, and Selection", American Journal of Human Genetics, Volume 101, Issue 3, 7 September 2017, Pages 353-368 (available on ScienceDirect)
Chaubey, G., et al. “Like sugar in milk”: reconstructing the genetic history of the Parsi population. Genome Biol 18, 110 (Jun 2017). (DOI), (BiomedCentral)
Patell, V.M, et al, ''The First complete Zoroastrian-Parsi Mitochondria Reference Genome: Implications of mitochondrial signatures in an endogamous, non-smoking population", pre-print
The Avestagenome Project®,5 jun 2020, (DOI citation, PDF) — been around for 20+ years but no real studies
Parsi mtDNA study. Really comment and stating the obvious of what we have already seen of the mtDNA mix with South Asians. Based on only a single Parsi contributor to an ad-hoc collection of data.

South Asian / Indian Subcontinent

Qamar, R., et al, "Y-Chromosomal DNA Variation in Pakistan", Am J Hum Genet. 2002 May; 70(5): 1107–1124 (PDF
Wikipedia Y DNA Haplogroups of South Asia (combined table from many papers)
Khurana, P. at al, Y Chromosome Haplogroup Distribution in Indo-European Speaking Tribes of Gujarat, Western India, PLOS ONE 9(3): e90414. https://doi.org/10.1371/journal.pone.0090414
Sahoo, S. et al A prehistory of Indian Y chromosomes: Evaluating demic diffusion scenarios, PNAS January 24, 2006 103 (4) 843-848; https://doi.org/10.1073/pnas.0507714103
Singh, M., et al A comprehensive portrait of Y-STR diversity of Indian populations and comparison with 129 worldwide populations, Sci Rep 8, 15421 (2018).
Mahal, D. and Matsoukas, I, The Geographic Origins of Ethnic Groups in the Indian Subcontinent: Exploring Ancient Footprints with Y-DNA Haplogroups, Front Genet. 2018; 9: 4. doi: 10.3389/fgene.2018.00004
Quintana-Murci, Lluís et al, Where West Meets East: The Complex mtDNA Landscape of the Southwest and Central Asian Corridor Volume 74, Issue 5, May 2004, Pages 827-845, The American Journal of Human Genetics (AJHG), https://doi.org/10.1086/383236
Ajmal, Zack; Harappadna.org: crowd-sourced submission of Indian Subcontinent DNA samples (over 300 samples; active mostly in 2011)

Persia / Arabia

Mineta, Katsuhiko, et al, Population structure of indigenous inhabitants of Arabia January 11, 2021, PLOS Genetics
Regueiro, M et al, "Iran: Tricontinental Nexus for Y-chromosome Driven Migration", Hum Hered 2006;61:132–143 (PDF)
Sahakyan, Hovhannes, et al, Origin and diffusion of human Y chromosome haplogroup J1-M267, 23 Mar 2021, Scientific Reports of Nature

Resources

Endogamy: One Family, One People — book by Israel Pikholtz covering highly endogamous populations (Amazon)
Surname Origins from the Guild of One-Name Studies website (shows classifications of surname origins; not particular to Parsi names themselves)

Parsi DNA

Parsi DNA

On this page:

Introduction

Autosomal (and X) Segment Matching

Y Haplogroups (Patriline)

Mitochondrial Haplogroups (Matriline)

Surnames

References

Parsi direct

South Asian / Indian Subcontinent

Persia / Arabia

Resources

See Also

Backlinks

Structures

Parsi DNA

Parsi DNA

On this page:

Introduction

Autosomal (and X) Segment Matching

Y Haplogroups (Patriline)

Mitochondrial Haplogroups (Matriline)

Surnames

References

Parsi direct

South Asian / Indian Subcontinent

Persia / Arabia

Resources

See Also