Tajima's D
Tajima's D is a population genetic test statistic created by and named after the Japanese researcher Fumio Tajima.
Tajima's D is computed as the difference between two measures of genetic diversity: The mean number of pairwise differences and the number of segregating sites, each scaled so that they are expected to be the same in a neutrally evolving population of constant size.
The purpose of the statistic is to distinguish between a DNA sequence evolving randomly ("neutrally") and one evolving under a non-random process. The non-random process might be directional or balancing selection, demographic expansion or contraction, genetic hitchhiking, or even introgression.
NaturalSelection.tajimad — Method.tajimad(π::AbstractFloat, S::Integer, n::Integer)Compute Tajima's D from:
π: The average number of SNPs found in (n choose 2) pairwise comparisons of a sample of sequences.S: The number of segregating sites in a sample of sequences.n: The number of sequences in your sample.
Example
tajimad(3.88888, 16, 10)NaturalSelection.tajimad — Method.tajimad(seqs)Compute Tajima's D from a collection of BioSequences{DNAAlphabet{n}} (n = 2 or 4).
This will estimate the π, S, and n parameters from the sequences and use those parameters to estimate Tajima's D.
Example
sample = [dna"ATAATAAAAAAATAATAAAAAAATAAAAAAAATAAAAAAAA",
dna"AAAAAAAATAAATAATAAAAAAATAAAAAAAAAAAAAAAAA",
dna"AAAATAAAAATATAATAAAAAAATATAAAAAAAAAAAAAAA",
dna"AAAAAAAAAAAATAATAAAAAAATAAATAAATAAAAAAAAA",
dna"AAAATAAAAAAAATATAAAAAAATAAAAAAAAAAAAAAAAA",
dna"AAAATAAAAAAAAAATAAAAAAAAAAAAAAAAAAATAAAAA",
dna"AAAAAATAAAAATAATAAAAAAATAAAAAAAAAAAAAAAAA",
dna"AAAAAAAAAAAAAAATAAAAAAATAAAAAAAAAAAAAAATA",
dna"AAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAA",
dna"AAAAAAAAAAAAAAATAAAAAAATAATAAAAAAAAAAAAAA"]
tajimad(sample)