BioMarkovChains.BioMarkovChain Type
struct BioMarkovChain{A<:Alphabet} <: AbstractBioMarkovChain
A BioMarkovChain represents a Markov chain used in biological sequence analysis. It contains a transition probability matrix (tpm) and an initial distribution of probabilities (inits) and also the order of the Markov chain.
Fields
tpm::Matrix{Float64}
: The transition probability matrix.inits::Vector{Float64}
: The initial distribution of probabilities.n::Int
: The order of the Markov chain.
Constructors
BioMarkovChain{A}(tpm::Matrix{Float64}, inits::Vector{Float64}, n::N=1) where {A}
: Constructs a BioMarkovChain object with the provided transition probability matrix, initial distribution, and order.BioMarkovChain{A}(seq::SeqOrView{A}, n::Int64=1) where {A}
: Constructs a BioMarkovChain object based on the DNA sequence and transition order.
Example
seq = LongDNA{4}("ACTACATCTA")
model = BioMarkovChain(seq, 2)
BioMarkovChain of DNA alphabet and order 2:
- Transition Probability Matrix -> Matrix{Float64}(4 × 4):
0.4444 0.1111 0.0 0.4444
0.4444 0.4444 0.0 0.1111
0.0 0.0 0.0 0.0
0.1111 0.4444 0.0 0.4444
- Initial Probabilities -> Vector{Float64}(4 × 1):
0.3333 0.3333 0.0 0.3333
BioMarkovChains.initials Method
initials(sequence::SeqOrView{A}) where A
Calculate the estimated initial probabilities for a Markov chain based on a given sequence.
This function takes a sequence of states and calculates the estimated initial probabilities of each state in the sequence for a Markov chain. The initial probabilities are estimated by counting the occurrences of each state at the beginning of the sequence and normalizing the counts to sum up to 1.
Now using the dinucleotides counts estimating the initials would follow:
Arguments
sequence::SeqOrView{A}
: The sequence of states representing the Markov chain.
Returns
An Vector{Flot64}
of estimated initial probabilities for each state in the sequence.
BioMarkovChains.log_odds_ratio_matrix Method
log_odds_ratio_matrix(model1::BioMarkovChain, model2::BioMarkovChain)
Calculates the log-odds ratio between the transition probability matrices of two BioMarkovChain models.
Where
Arguments
model1::BioMarkovChain
: The first BioMarkovChain model.model2::BioMarkovChain
: The second BioMarkovChain model.
BioMarkovChains.log_odds_ratio_score Method
log_odds_ratio_score(sequence::SeqOrView{A}; modela::BioMarkovChain, modelb::BioMarkovChain, b::Number = 2)
Compute the log odds ratio score between a given sequence and a BioMarkovChain model.
Arguments
sequence::SeqOrView{A}
: A sequence of elements of typeA
.modela::BioMarkovChain
: A BioMarkovChain model.modelb::BioMarkovChain
: A BioMarkovChain model.b::Number = 2
: The base of the logarithm used to compute the log odds ratio.
Returns
The log odds ratio score between the sequence and the models.
Example
BioMarkovChains.markovprobability Method
markovprobability(sequence::LongNucOrView{4}, model::BioMarkovChain)
Compute the probability of a given sequence using a transition probability matrix and the initial probabilities distributions of a BioMarkovModel
.
Arguments
sequence::LongNucOrView{4}
: The input sequence of nucleotides.
Keywords
model::BioMarkovChain=ECOLICDS
: A givenBioMarkovChain
model.logscale::Bool=false
: If true, the function will return the log2 of the probability.b::Number=2
: The base of the logarithm used to compute the log odds ratio.
Returns
probability::Float64
: The probability of the input sequence given the model.
Example
seq = LongDNA{4}("CGCGCGCGCGCGCGCGCGCGCGCGCG")
markovprobability(seq, model=CPGPOS, logscale=true)
-45.073409957110556
markovprobability(seq, model=CPGNEG, logscale=true)
-74.18912168395339
BioMarkovChains.perronfrobenius Method
perronfrobenius(sequence::SeqOrView{A}, n::Int64=1) where A
Compute the Perron-Frobenius matrix, a column-stochastic version of the transition probability matrix (TPM), for a given nucleotide sequence.
The Perron-Frobenius matrix captures the asymptotic probabilities of transitioning between nucleotides in the sequence over a specified number of steps n
. It provides insight into the long-term behavior of a Markov chain or a dynamical system associated with the sequence.
Arguments
sequence::SeqOrView{A}
: A nucleotide sequence represented as aNucleicSeqOrView{A}
object.n::Int64=1
: The number of steps to consider for the transition probability matrix. Default is 1.
Returns
A copy of the Perron-Frobenius matrix. Each column of this matrix corresponds to the probabilities of transitioning from the current nucleotide state to all possible nucleotide states after n
steps.
Example
sequence = LongSequence{DNAAlphabet{4}}("ACGTCGTCCACTACGACATCAGC") # Replace with an actual nucleotide sequence
n = 2
pf = perronfrobenius(sequence, n)
BioMarkovChains.transition_count_matrix Method
transition_count_matrix(sequence::LongSequence{DNAAlphabet{4}})
Compute the transition count matrix (TCM) of a given DNA sequence.
Arguments
sequence::LongSequence{DNAAlphabet{4}}
: aLongSequence{DNAAlphabet{4}}
object representing the DNA sequence.
Returns
A Matrix
object representing the transition count matrix of the sequence.
Example
seq = LongDNA{4}("AGCTAGCTAGCT")
tcm = transition_count_matrix(seq)
4×4 Matrix{Int64}:
0 0 3 0
0 0 0 3
0 3 0 0
2 0 0 0
BioMarkovChains.transition_probability_matrix Method
transition_probability_matrix(sequence::LongSequence{DNAAlphabet{4}}, n::Int64=1)
Compute the transition probability matrix (TPM) of a given DNA sequence. Formally it construct
The transition matrices of DNA and Amino-Acids are arranged sorted and in row-wise matrices:
First the DNA matrix:
And then, the Aminoacids:
Arguments
sequence::LongNucOrView{4}
: aLongNucOrView{4}
object representing the DNA sequence.n::Int64=1
: The order of the Markov model. That is the
Keywords
extended_alphabet::Bool=false
: If true will pass the extended alphabet of DNA to search
Returns
A Matrix
object representing the transition probability matrix of the sequence.
Example
seq = dna"AGCTAGCTAGCT"
tpm = transition_probability_matrix(seq)
4×4 Matrix{Float64}:
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0
0.0 1.0 0.0 0.0
1.0 0.0 0.0 0.0