BioMarkovChains.jl

julia

BMC

Alias for the type BioMarkovChain.

source

BioMarkovChains.BioMarkovChain Type

julia

struct BioMarkovChain{A<:Alphabet} <: AbstractBioMarkovChain

A BioMarkovChain represents a Markov chain used in biological sequence analysis. It contains a transition probability matrix (tpm) and an initial distribution of probabilities (inits) and also the order of the Markov chain.

Fields

tpm::Matrix{Float64}: The transition probability matrix.
inits::Vector{Float64}: The initial distribution of probabilities.
n::Int: The order of the Markov chain.

Constructors

BioMarkovChain{A}(tpm::Matrix{Float64}, inits::Vector{Float64}, n::N=1) where {A}: Constructs a BioMarkovChain object with the provided transition probability matrix, initial distribution, and order.
BioMarkovChain{A}(seq::SeqOrView{A}, n::Int64=1) where {A}: Constructs a BioMarkovChain object based on the DNA sequence and transition order.

Example

julia

seq = LongDNA{4}("ACTACATCTA")

model = BioMarkovChain(seq, 2)

BioMarkovChain of DNA alphabet and order 2:
  - Transition Probability Matrix -> Matrix{Float64}(4 × 4):
   0.4444  0.1111  0.0     0.4444
   0.4444  0.4444  0.0     0.1111
   0.0     0.0     0.0     0.0
   0.1111  0.4444  0.0     0.4444
  - Initial Probabilities -> Vector{Float64}(4 × 1):
   0.3333  0.3333  0.0     0.3333

source

BioMarkovChains.initials Method

julia

initials(sequence::SeqOrView{A}) where A

Calculate the estimated initial probabilities for a Markov chain based on a given sequence.

This function takes a sequence of states and calculates the estimated initial probabilities of each state in the sequence for a Markov chain. The initial probabilities are estimated by counting the occurrences of each state at the beginning of the sequence and normalizing the counts to sum up to 1.

\begin{aligned} π i & = P (X_{i} = i), i \in T \\ \sum_{i = 1}^{N} π_{i} & = 1 \end{aligned}

Now using the dinucleotides counts estimating the initials would follow:

\hat{π_{i}} = c_{i} \sum_{k} c_{k}

Arguments

sequence::SeqOrView{A}: The sequence of states representing the Markov chain.

Returns

An Vector{Flot64} of estimated initial probabilities for each state in the sequence.

source

BioMarkovChains.log_odds_ratio_matrix Method

julia

log_odds_ratio_matrix(model1::BioMarkovChain, model2::BioMarkovChain)

Calculates the log-odds ratio between the transition probability matrices of two BioMarkovChain models.

β = \log \frac{P (x | m_{1})}{P (x | m_{2})}

Where $m_{1}$ and $m_{2}$ are the two models transition probability matrices.

Arguments

model1::BioMarkovChain: The first BioMarkovChain model.
model2::BioMarkovChain: The second BioMarkovChain model.

source

BioMarkovChains.log_odds_ratio_score Method

julia

log_odds_ratio_score(sequence::SeqOrView{A}; modela::BioMarkovChain, modelb::BioMarkovChain, b::Number = 2)

Compute the log odds ratio score between a given sequence and a BioMarkovChain model.

S (x) = \sum_{i = 1}^{L} β_{x_{i} x} = \sum_{i = 1} \log \frac{a_{i - 1}^{m_{1}} x_{i}}{a_{i - 1}^{m_{2}} x_{i}}

Arguments

sequence::SeqOrView{A}: A sequence of elements of type A.
modela::BioMarkovChain: A BioMarkovChain model.
modelb::BioMarkovChain: A BioMarkovChain model.
b::Number = 2: The base of the logarithm used to compute the log odds ratio.

Returns

The log odds ratio score between the sequence and the models.

Example

source

BioMarkovChains.markovprobability Method

julia

markovprobability(sequence::LongNucOrView{4}, model::BioMarkovChain)

Compute the probability of a given sequence using a transition probability matrix and the initial probabilities distributions of a BioMarkovModel.

P (X_{1} = i_{1}, \dots, X_{T} = i_{T}) = π_{i_{1}}^{T - 1} \prod_{t = 1}^{T - 1} a_{i_{t}, i_{t + 1}}

Arguments

sequence::LongNucOrView{4}: The input sequence of nucleotides.

Keywords

model::BioMarkovChain=ECOLICDS: A given BioMarkovChain model.
logscale::Bool=false: If true, the function will return the log2 of the probability.
b::Number=2: The base of the logarithm used to compute the log odds ratio.

Returns

probability::Float64: The probability of the input sequence given the model.

Example

julia

seq = LongDNA{4}("CGCGCGCGCGCGCGCGCGCGCGCGCG")
   
markovprobability(seq, model=CPGPOS, logscale=true)
    -45.073409957110556

markovprobability(seq, model=CPGNEG, logscale=true)
    -74.18912168395339

source

BioMarkovChains.perronfrobenius Method

julia

perronfrobenius(sequence::SeqOrView{A}, n::Int64=1) where A

Compute the Perron-Frobenius matrix, a column-stochastic version of the transition probability matrix (TPM), for a given nucleotide sequence.

The Perron-Frobenius matrix captures the asymptotic probabilities of transitioning between nucleotides in the sequence over a specified number of steps n. It provides insight into the long-term behavior of a Markov chain or a dynamical system associated with the sequence.

Arguments

sequence::SeqOrView{A}: A nucleotide sequence represented as a NucleicSeqOrView{A} object.
n::Int64=1: The number of steps to consider for the transition probability matrix. Default is 1.

Returns

A copy of the Perron-Frobenius matrix. Each column of this matrix corresponds to the probabilities of transitioning from the current nucleotide state to all possible nucleotide states after n steps.

Example

julia

sequence = LongSequence{DNAAlphabet{4}}("ACGTCGTCCACTACGACATCAGC")  # Replace with an actual nucleotide sequence
n = 2
pf = perronfrobenius(sequence, n)

source

BioMarkovChains.transition_count_matrix Method

julia

transition_count_matrix(sequence::LongSequence{DNAAlphabet{4}})

Compute the transition count matrix (TCM) of a given DNA sequence.

Arguments

sequence::LongSequence{DNAAlphabet{4}}: a LongSequence{DNAAlphabet{4}} object representing the DNA sequence.

Returns

A Matrix object representing the transition count matrix of the sequence.

Example

seq = LongDNA{4}("AGCTAGCTAGCT")

tcm = transition_count_matrix(seq)

4×4 Matrix{Int64}:
 0  0  3  0
 0  0  0  3
 0  3  0  0
 2  0  0  0

source

BioMarkovChains.transition_probability_matrix Method

julia

transition_probability_matrix(sequence::LongSequence{DNAAlphabet{4}}, n::Int64=1)

Compute the transition probability matrix (TPM) of a given DNA sequence. Formally it construct $\hat{M}$ where:

m_{i j} = P (X_{t} = j ∣ X_{t - 1} = i) = \frac{P (X_{t - 1} = i, X_{t} = j)}{P (X_{t - 1} = i)}

The transition matrices of DNA and Amino-Acids are arranged sorted and in row-wise matrices:

First the DNA matrix:

M_{D N A} = [\begin{matrix} _{A A} & _{A C} & _{A G} & _{A T} \\ _{C A} & _{C C} & _{C G} & _{C T} \\ _{G A} & _{G C} & _{G G} & _{G T} \\ _{T A} & _{T C} & _{T G} & _{T T} \end{matrix}]

And then, the Aminoacids:

M_{A A} = [\begin{matrix} _{A A} & _{A C} & _{A D} & \dots & _{A W} \\ _{C A} & _{C C} & _{C D} & \dots & _{C W} \\ _{D A} & _{D C} & _{D D} & \dots & _{D W} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ _{W A} & _{W C} & _{W D} & \dots & _{W W} \end{matrix}]

Arguments

sequence::LongNucOrView{4}: a LongNucOrView{4} object representing the DNA sequence.
n::Int64=1: The order of the Markov model. That is the ${\hat{M}}^{n}$

Keywords

extended_alphabet::Bool=false: If true will pass the extended alphabet of DNA to search

Returns

A Matrix object representing the transition probability matrix of the sequence.

Example

julia

seq = dna"AGCTAGCTAGCT"

tpm = transition_probability_matrix(seq)

4×4 Matrix{Float64}:
 0.0  0.0  1.0  0.0
 0.0  0.0  0.0  1.0
 0.0  1.0  0.0  0.0
 1.0  0.0  0.0  0.0

source