Computing allele frequencies
When first looking at the diversity present in a population, it is common to want to know how many of each unique allele there is in a population i.e. the allele frequencies of the population are.
Formally defined, allele frequency is a measure of the relative frequency of an allele on a genetic locus in a population.
In population genetics, allele frequencies show the genetic diversity of a species population or equivalently the richness of its gene pool.
Population genetics studies the different "forces" that might lead to changes in the distribution and frequencies of alleles - in other words, to evolution.
Besides selection, these forces include genetic drift, mutation and migration.
Computing allele frequencies then, is an essential task for many wishing to work with genetic variation, and so methods for computing such frequencies are included in GeneticVariation.jl.
Allele frequencies can be computed for genes, micro-satellites, RFPL patterns, and from SNPs.
GeneticVariation.gene_frequencies
— Function.gene_frequencies(seqcounts::Composition{T}) where T <: Sequence
Compute gene_frequencies from a BioSequences.Composition
variable that contains unique sequence counts.
gene_frequencies(iterable)
Compute the gene frequencies for any iterable with an eltype
which is a concrete subtype of the abstract Sequence
type.
Computing measures of genetic diversity
There are various methods of quantifying the amount of genetic variation in biological data with GeneticVariation.jl:
GeneticVariation.avg_mut
— Function.avg_mut(sequences)
The average number of mutations found in (n choose 2) pairwise comparisons of sequences (i, j) in a sample of sequences.
sequences
should be any indexable container of DNA sequence types.
Nucleotide diversity
Nucleotide diversity is a concept in molecular genetics which is used to measure the degree of polymorphism within a population.
There are different methods which can be used to compute measures of nucleotide diversity, we list them below, and show how to compute them using GeneticVariation.
GeneticVariation.NL79
— Function.NL79(m::M, f::V) where {M<:AbstractMatrix{Float64},V<:AbstractVector{Float64}}
Compute nucleotide diversity using a matrix of the number of mutations between sequence pairs, and a vector of the frequencies of each sequence in the population.
NL79(sequences)
Compute nucleotide diversity, as described by Nei and Li in 1979.
This measure is defined as the average number of nucleotide differences per site between two DNA sequences in all possible pairs in the sample population, and is often denoted by the greek letter pi.
Sequences
should be any iterable that yields biosequence types.
Examples
julia> testSeqs = [dna"AAAACTTTTACCCCCGGGGG",
dna"AAAACTTTTACCCCCGGGGG",
dna"AAAACTTTTACCCCCGGGGG",
dna"AAAACTTTTACCCCCGGGGG",
dna"AAAAATTTTACCCCCGTGGG",
dna"AAAAATTTTACCCCCGTGGG",
dna"AAAACTTTTTCCCCCGTAGG",
dna"AAAACTTTTTCCCCCGTAGG",
dna"AAAAATTTTTCCCCCGGAGG",
dna"AAAAATTTTTCCCCCGGAGG"]
10-element Array{BioSequences.BioSequence{BioSequences.DNAAlphabet{4}},1}:
AAAACTTTTACCCCCGGGGG
AAAACTTTTACCCCCGGGGG
AAAACTTTTACCCCCGGGGG
AAAACTTTTACCCCCGGGGG
AAAAATTTTACCCCCGTGGG
AAAAATTTTACCCCCGTGGG
AAAACTTTTTCCCCCGTAGG
AAAACTTTTTCCCCCGTAGG
AAAAATTTTTCCCCCGGAGG
AAAAATTTTTCCCCCGGAGG
julia> NL79(testSeqs)
0.096