Genetic Diversity

Computing allele frequencies

When first looking at the diversity present in a population, it is common to want to know how many of each unique allele there is in a population i.e. the allele frequencies of the population are.

Formally defined, allele frequency is a measure of the relative frequency of an allele on a genetic locus in a population.

In population genetics, allele frequencies show the genetic diversity of a species population or equivalently the richness of its gene pool.

Population genetics studies the different "forces" that might lead to changes in the distribution and frequencies of alleles - in other words, to evolution.

Besides selection, these forces include genetic drift, mutation and migration.

Computing allele frequencies then, is an essential task for many wishing to work with genetic variation, and so methods for computing such frequencies are included in GeneticVariation.jl.

Allele frequencies can be computed for genes, micro-satellites, RFPL patterns, and from SNPs.

gene_frequencies(seqcounts::Composition{T}) where T <: Sequence

Compute gene_frequencies from a BioSequences.Composition variable that contains unique sequence counts.

source
gene_frequencies(iterable)

Compute the gene frequencies for any iterable with an eltype which is a concrete subtype of the abstract Sequence type.

source

Computing measures of genetic diversity

There are various methods of quantifying the amount of genetic variation in biological data with GeneticVariation.jl:

avg_mut(sequences)

The average number of mutations found in (n choose 2) pairwise comparisons of sequences (i, j) in a sample of sequences.

sequences should be any indexable container of DNA sequence types.

source

Nucleotide diversity

Nucleotide diversity is a concept in molecular genetics which is used to measure the degree of polymorphism within a population.

There are different methods which can be used to compute measures of nucleotide diversity, we list them below, and show how to compute them using GeneticVariation.

GeneticVariation.NL79Function.
NL79(m::M, f::V) where {M<:AbstractMatrix{Float64},V<:AbstractVector{Float64}}

Compute nucleotide diversity using a matrix of the number of mutations between sequence pairs, and a vector of the frequencies of each sequence in the population.

source
NL79(sequences)

Compute nucleotide diversity, as described by Nei and Li in 1979.

This measure is defined as the average number of nucleotide differences per site between two DNA sequences in all possible pairs in the sample population, and is often denoted by the greek letter pi.

Sequences should be any iterable that yields biosequence types.

Examples

julia> testSeqs = [dna"AAAACTTTTACCCCCGGGGG",
                   dna"AAAACTTTTACCCCCGGGGG",
                   dna"AAAACTTTTACCCCCGGGGG",
                   dna"AAAACTTTTACCCCCGGGGG",
                   dna"AAAAATTTTACCCCCGTGGG",
                   dna"AAAAATTTTACCCCCGTGGG",
                   dna"AAAACTTTTTCCCCCGTAGG",
                   dna"AAAACTTTTTCCCCCGTAGG",
                   dna"AAAAATTTTTCCCCCGGAGG",
                   dna"AAAAATTTTTCCCCCGGAGG"]
10-element Array{BioSequences.BioSequence{BioSequences.DNAAlphabet{4}},1}:
 AAAACTTTTACCCCCGGGGG
 AAAACTTTTACCCCCGGGGG
 AAAACTTTTACCCCCGGGGG
 AAAACTTTTACCCCCGGGGG
 AAAAATTTTACCCCCGTGGG
 AAAAATTTTACCCCCGTGGG
 AAAACTTTTTCCCCCGTAGG
 AAAACTTTTTCCCCCGTAGG
 AAAAATTTTTCCCCCGGAGG
 AAAAATTTTTCCCCCGGAGG

 julia> NL79(testSeqs)
 0.096
source