|
MUTATION RATES
Anatole Klyosov -
summary of methods to confirm family trees through mutation rate analysis of
clusters
To
calculate mutation rates you first need to identify a cluster - a branch of
a haplogroup tree with the modal haplotype being the common ancestor.
Copying of DNA from father to son can result in mutations in SNPs (occur
once and remain) and STRs (short tandem repeat markers that are either
shortened or lengthened). The group of STRs are known as haplotypes.
The STRs of a common ancestor is known as the modal haplotype of a
haplogroup.
Example of mutation rates: in 6 of 12 markers mutations occur at the rate of
1 in 2840 years. In 67 markers, 1 mutation takes place in 170 years.
If you are a genetic distance of 2 from 67 markers of the modal haplotype of
a haplogroup, chances are that it took 340 years for the mutation to take
place.
|
Markers |
DYS |
Mutations |
Ave mutation rate |
Ave mutation rate |
|
|
|
per haplotype |
per maker |
6 of 12 |
19, 388, 390, 391, 392, 393 |
1 in 2840 yrs |
.0088 |
.00147 in FTDNA order |
12
|
12 marker
panel |
1 in 1140 yrs |
.022 |
.00183
in FTDNA order |
25
|
25
marker panel |
1 in
540 yrs |
.046 |
.00184 in FTDNA order |
37
|
37
marker panel |
1 in
280 yrs |
.090 |
.00243
in FTDNA order |
67 |
67
marker panel |
1 in
170 yrs |
.145 |
.00216
in FTDNA order |
In
order to find the Time to the Most Recent Common Ancestor (TMRCA), more than
one haplotype will be needed: at least
-
(40) 6 markers
-
??? 12 markers
-
(10) 25 markers
-
(4) 67 markers
The
smaller the amount of markers or haplotypes included, the less reliable the
result. The average STR mutations is used to calculate the time span
from the MRCA.
Assumption: A generation is 25 years and occurs 4 times in 100 years.
|
2 Methods:
-
Logarithmic - unmutated
-
Linear - mutation count
-
n/N/μ = t
-
n=# mutations
-
N=# haplotypes
-
μ=average mutation rate
-
t=# years
If
both methods have approximately the same result this is a 'clean' result
If
the methods yield different results, the haplotypes in calculation are
probably from a mixed population:
- "The main reasons of such a
discrepancy are typically as follows:
-
-
different mutation
rates employed by researchers,
-
-
lack of calibration of
mutation rates using known genealogies or known historical
events, or when a time depth for known genealogies was
insufficient to get all principal loci involved,
-
-
mixed
series of haplotypes, which are often derived from different
clades, and in different proportions between those series, which
directly affect a number of mutations in the series,
-
-
lack of corrections for reverse
mutations (ASD-based calculations [see below] do not need such a
correction),
-
-
lack of corrections for asymmetry of mutations
in the given series of haplotypes – in some
cases."
-
I used the different phylogenetic PHYLIP programs, features and TreeView
to find the clusters. If they didn't have a clean result, I
realized that not enough haplotypes were being represented. Given the population in the world and the
number of the South Irish in my FTDNA projects that I include in the
case study the probability for finding family clusters is quite low. For perspective the U.S.
population is 312,016,766, FTDNA has 343,915 members and I have 185 South
Irish members with a genetic distance less than 9.
-
Population |
Calculation |
Percentage |
FTDNA members vs U.S. population
|
343,915/312,016,766= |
.0011 |
South Irish project members vs FTDNA members |
185/343,915= |
.0005 |
South Irish
FTDNA members vs U.S. population
|
185/312,016,766= |
0.00000059 |
South Irish
FTDNA
members vs Irish immigrants during famine |
185/2,000,000= |
0.0000925 |
South Irish
FTDNA
members vs Irish
immigrants in U.S. |
185/36,278,332= |
0.000005099463 |
|
|
|