dN / dS
NaturalSelection.jl provides several different methods of inferring the action of natural selection from coding sequences.
Evolutionary pressures on proteins are often quantified by the ratio of substitution rates at non-synonymous and synonymous sites i.e. dN/dS.
The dN/dS ratio was originally developed for application to distantly diverged sequences, the differences among which represent substitutions that have fixed along independent lineages.
Nevertheless, the dN/dS measure is often applied to sequences sampled from a single population, the differences among which represent segregating polymorphisms. However, do be careful if this is what you are doing, as it has been demonstrated that dN/dS is not always suitable for such purposes (Sergey Kryazhimskiy & Joshua B. Plotkin, 2008).
The NG86 method
NaturalSelection.dNdS_NG86 — Function.dNdS_NG86(x, y, addone::Bool = true, code::Int = 1)Compute dN and dS, using the Nei and Gojobori 1986 method.
The genetic code that is used, is defined according to the numbering of ncbi_trans_table. Code 1 is the standard genetic code.
This function requires two iterables x and y. If these iterables yield Codon{DNA} or Codon{RNA} type variables. Then it is assumed that x and y are iterables that yield a sequence of aligned codons. If the iterables produce DNA or RNA type variables, then it is assumed x and y iterables that conform to the behaviour of DNA or RNA sequences as defined in the BioSequences package. In this case, a new x and y that do have an element type of Codon{DNA} or Codon{RNA}.
NG86 is a counting method of computing dN/dS and is typically safer to use on sequence data where codon usage, (esp. at 3rd position), is uniform, the sequences are not very divergent, and transition/transversion rates, are similar.
NaturalSelection.S_N_NG86 — Function.S_N_NG86(codon::C, code::GeneticCode) where {C <: CDN}Enumerate the number of synonymous (S) and non-synonymous (N) sites in a codon, using the method used by Nei and Gojobori (1986).
Returns a tuple where S is the first element and N is the second (S, N).
Each site in a codon may be both partially synonymous and non-synonymous.
NaturalSelection.DS_DN_NG86 — Function.DS_DN_NG86(x::C, y::C, code::GeneticCode) where C <: CDNCompute the number of synonymous (DS) and non-synonymous (DN) mutations between two codons, using the all paths method used by the Nei and Gojobori (1986).
The MacDonald Kreitman Test
This test detects and measure the amount of adaptive evolution within a species by determining whether adaptive evolution has occurred, and the proportion of substitutions that resulted from positive selection.
To do this, the McDonald–Kreitman test compares the amount of variation within a species (polymorphism) to the divergence between species (substitutions) at two types of sites, neutral and nonneutral.
A substitution refers to a nucleotide that is fixed within one species, but a different nucleotide is fixed within a second species at the same base pair of homologous DNA sequences.
The two types of sites can be either synonymous or nonsynonymous within a protein-coding region.
The null hypothesis of the McDonald–Kreitman test is that the ratio of nonsynonymous to synonymous variation within a species is going to equal the ratio of nonsynonymous to synonymous variation between species (i.e. Dn/Ds = Pn/Ps).
When positive or negative selection (natural selection) influences nonsynonymous variation, the ratios will no longer equal. The ratio of nonsynonymous to synonymous variation between species is going to be lower than the ratio of nonsynonymous to synonymous variation within species (i.e. Dn/Ds < Pn/Ps) when negative selection is at work, and deleterious mutations strongly affect polymorphism. The ratio of nonsynonymous to synonymous variation within species is lower than the ratio of nonsynonymous to synonymous variation between species (i.e. Dn/Ds > Pn/Ps) when we observe positive selection.
Under neutrality the expectation is (Pn / Ps) == (Dn / Ds).
The statistic α represents the proportion of substitutions driven by positive selection. α can be equal to any number between -Inf and 1. Negative values of alpha are produced by sampling error, assumption violations, or the segregation of slightly deleterious amino acid mutations. The null hypothesis here is that α = 0.
The neutrality index (NI) quantifies the direction and degree of departure from neutrality (where Pn/Ps and Dn/Ds ratios equal). A neutrality index greater than 1 is indicative of negative selection, a neutrality index lower than 1 indicates positive selection is at work in the population.
This test is provided by the function mkt.
NaturalSelection.mkt — Function.mkt(x, y, ref)Compute McDonald-Kreitman test statistics for two sets of codon sequences.
This function returns the values of Ps, Pn, Ds, Dn, α, and the neutrality index (NI).
References
For more about the McDonald Kreitman Test, see the following references:
https://en.wikipedia.org/wiki/McDonald%E2%80%93Kreitman_test
McDonald, J. H. Kreitman (1991)