Tajima's D

Tajima's D

Tajima's D is a population genetic test statistic created by and named after the Japanese researcher Fumio Tajima.

Tajima's D is computed as the difference between two measures of genetic diversity: The mean number of pairwise differences and the number of segregating sites, each scaled so that they are expected to be the same in a neutrally evolving population of constant size.

The purpose of the statistic is to distinguish between a DNA sequence evolving randomly ("neutrally") and one evolving under a non-random process. The non-random process might be directional or balancing selection, demographic expansion or contraction, genetic hitchhiking, or even introgression.

tajimad(π::AbstractFloat, S::Integer, n::Integer)

Compute Tajima's D from:

  • π: The average number of SNPs found in (n choose 2) pairwise comparisons of a sample of sequences.

  • S: The number of segregating sites in a sample of sequences.

  • n: The number of sequences in your sample.

Example

tajimad(3.88888, 16, 10)
source
tajimad(seqs)

Compute Tajima's D from a collection of BioSequences{DNAAlphabet{n}} (n = 2 or 4).

This will estimate the π, S, and n parameters from the sequences and use those parameters to estimate Tajima's D.

Example


sample = [dna"ATAATAAAAAAATAATAAAAAAATAAAAAAAATAAAAAAAA",
          dna"AAAAAAAATAAATAATAAAAAAATAAAAAAAAAAAAAAAAA",
          dna"AAAATAAAAATATAATAAAAAAATATAAAAAAAAAAAAAAA",
          dna"AAAAAAAAAAAATAATAAAAAAATAAATAAATAAAAAAAAA",
          dna"AAAATAAAAAAAATATAAAAAAATAAAAAAAAAAAAAAAAA",
          dna"AAAATAAAAAAAAAATAAAAAAAAAAAAAAAAAAATAAAAA",
          dna"AAAAAATAAAAATAATAAAAAAATAAAAAAAAAAAAAAAAA",
          dna"AAAAAAAAAAAAAAATAAAAAAATAAAAAAAAAAAAAAATA",
          dna"AAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAA",
          dna"AAAAAAAAAAAAAAATAAAAAAATAATAAAAAAAAAAAAAA"]

tajimad(sample)
source