Identifying and counting sequence sites

GeneticVariation.jl extends the site-counting methods in BioSequences.jl, using the same fast bit-parallel techniques to rapidly compute the numbers of different types of mutations between two large biological sequences. Such computation is required for many population genetic analyses of variation, such as computation of evolutionary distances.

Types of site added

GeneticVariation.ConservedType

A Conserved site describes a site where two aligned nucleotides are definately conserved. By definately conserved this means that the symbols of the site are non-ambiguity symbols, and they are the same symbol.

source
GeneticVariation.MutatedType

A Mutated site describes a site where two aligned nucleotides are definately mutated. By definately mutated this means that the symbols of the site are non-ambiguity symbols, and they are not the same symbol.

source
GeneticVariation.SegregatingType

Segregating sites are positions which show differences (polymorphisms) between related genes in a sequence alignment (are not conserved). Segregating sites include conservative, semi-conservative and non-conservative mutations.

source

See the site-counting section of the BioSequences.jl documentation to see how to use the count and count_pairwise methods to count different types of site.