BioMarkovChains.TransitionModelBioMarkovChains.dnaseqprobabilityBioMarkovChains.hasprematurestopBioMarkovChains.iscodingBioMarkovChains.transition_count_matrixBioMarkovChains.transition_modelBioMarkovChains.transition_probability_matrixBioMarkovChains.transitions
BioMarkovChains.TransitionModel — Typestruct TransitionModelThe TransitionModel struct represents a transition model used in a sequence analysis. It consists of a transition probability matrix (TransitionProbabilityMatrix) and initial distribution probabilities.
Fields
TransitionProbabilityMatrix::Matrix{Float64}: The transition probability matrix, a matrix of type Float64 representing the probabilities of transitioning from one state to another.initials::Matrix{Float64}: The initial distribution probabilities, a matrix of type Float64 representing the probabilities of starting in each state.n: is the order of the transition model, or in other words the order of the resulted Markov chain.
Constructors
TransitionModel(tpm::Matrix{Float64}, initials::Matrix{Float64}; n::Int64=1): Constructs aTransitionModelobject with the provided transition probability matrix and initial distribution probabilities.
BioMarkovChains.dnaseqprobability — Methodsequenceprobability(sequence::LongNucOrView{4}, tpm::Matrix{Float64}, initials=Vector{Float64})Compute the probability of a given sequence using a transition probability matrix and the initial probabilities distributions.
\[P(X_1 = i_1, \ldots, X_T = i_T) = \pi_{i_1}^{T-1} \prod_{t=1}^{T-1} a_{i_t, i_{t+1}}\]
Arguments
sequence::LongNucOrView{4}: The input sequence of nucleotides.tm::TransitionModelis the actual data structure composed of atpm::Matrix{Float64}the transition probability matrix andinitials=Vector{Float64}the initial state probabilities.
Returns
probability::Float64: The probability of the input sequence.
Example
mainseq = LongDNA{4}("CCTCCCGGACCCTGGGCTCGGGAC")
tpm = transition_probability_matrix(mainseq)
4×4 Matrix{Float64}:
0.0 1.0 0.0 0.0
0.0 0.5 0.2 0.3
0.25 0.125 0.625 0.0
0.0 0.667 0.333 0.0
initials = initial_distribution(mainseq)
1×4 Vector{Float64}:
0.0869565
0.434783
0.347826
0.130435
tm = transition_model(tpm, initials)
- Transition Probability Matrix -> Matrix{Float64}(4 × 4):
0.0 1.0 0.0 0.0
0.0 0.5 0.2 0.3
0.25 0.125 0.625 0.0
0.0 0.667 0.333 0.0
- Initial Probabilities -> Vector{Float64}(4 × 1):
0.087
0.435
0.348
0.13
- Markov Chain Order:1
newseq = LondDNA("CCTG")
4nt DNA Sequence:
CCTG
dnaseqprobability(newseq, tm)
0.0217BioMarkovChains.hasprematurestop — Methodhasprematurestop(sequence::LongNucOrView{4})::BoolDetermine whether the sequence of type LongSequence{DNAAlphabet{4}} contains a premature stop codon.
Returns a boolean indicating whether the sequence has more than one stop codon.
BioMarkovChains.iscoding — Functioniscoding(
sequence::LongSequence{DNAAlphabet{4}},
codingmodel::TransitionModel,
noncodingmodel::TransitionModel,
η::Float64 = 1e-5
)Check if a given DNA sequence is likely to be coding based on a log-odds ratio. The log-odds ratio is a statistical measure used to assess the likelihood of a sequence being coding or non-coding. It compares the probability of the sequence generated by a coding model to the probability of the sequence generated by a non-coding model. If the log-odds ratio exceeds a given threshold (η), the sequence is considered likely to be coding. It is formally described as a decision rule:
\[S(X) = \log \left( \frac{{P_C(X_1=i_1, \ldots, X_T=i_T)}}{{P_N(X_1=i_1, \ldots, X_T=i_T)}} \right) \begin{cases} > \eta & \Rightarrow \text{{coding}} \\ < \eta & \Rightarrow \text{{noncoding}} \end{cases}\]
Arguments
sequence::LongSequence{DNAAlphabet{4}}: The DNA sequence to be evaluated.codingmodel::TransitionModel: The transition model for coding regions.noncodingmodel::TransitionModel: The transition model for non-coding regions.η::Float64 = 1e-5: The threshold value (eta) for the log-odds ratio (default: 1e-5).
Returns
trueif the sequence is likely to be coding.falseif the sequence is likely to be non-coding.
Raises
ErrorException: if the length of the sequence is not divisible by 3.ErrorException: if the sequence contains a premature stop codon.
Example
sequence = LondDNA("ATGGCATCTAG")
codingmodel = TransitionModel()
noncodingmodel = TransitionModel()
iscoding(sequence, codingmodel, noncodingmodel) # Returns: trueBioMarkovChains.transition_count_matrix — Methodtransition_count_matrix(sequence::LongSequence{DNAAlphabet{4}})Compute the transition count matrix (TCM) of a given DNA sequence.
Arguments
sequence::LongSequence{DNAAlphabet{4}}: aLongSequence{DNAAlphabet{4}}object representing the DNA sequence.
Keywords
extended_alphabet::Bool=false: If true will pass the extended alphabet of DNA to search
Returns
A Matrix object representing the transition count matrix of the sequence.
Example
seq = LongDNA{4}("AGCTAGCTAGCT")
tcm = transition_count_matrix(seq)
4x4 Matrix{Int64}:
A C G T
A 0 0 3 0
C 0 0 0 3
G 0 3 0 0
T 2 0 0 0
BioMarkovChains.transition_model — Functiontransition_model(sequence::LongNucOrView{4}, n::Int64=1)Constructs a transition model based on the given DNA sequence and transition order.
Arguments
sequence::LongNucOrView{4}: A DNA sequence represented as aLongNucOrView{4}object.n::Int64 (optional): The transition order (default: 1).
Returns
A TransitionModel object representing the transition model.
transition_model(tpm::Matrix{Float64}, initials::Matrix{Float64}, n::Int64=1)Builds a transtition model based on the transition probability matrix and the initial distributions. It can also calculates higer orders of the model if n is changed.
Arguments
tpm::Matrix{Float64}: the transition probability matrixinitials::Vector{Float64}: the initial distributions of the model.n::Int64 (optional): The transition order (default: 1).
Returns
A TransitionProbabilityMatrix object representing the transition probability matrix.
Example
sequence = LongDNA{4}("ACTACATCTA")
model = transition_model(sequence, 2)
TransitionModel:
- Transition Probability Matrix -> Matrix{Float64}(4 × 4):
0.444 0.111 0.0 0.444
0.444 0.444 0.0 0.111
0.0 0.0 0.0 0.0
0.111 0.444 0.0 0.444
- Initial Probabilities -> Vector{Float64}(4 × 1):
0.333
0.333
0.0
0.333
- Markov Chain Order:2BioMarkovChains.transition_probability_matrix — Functiontransition_probability_matrix(sequence::LongSequence{DNAAlphabet{4}})Compute the transition probability matrix (TPM) of a given DNA sequence. Formally it construct $\hat{A}$ where:
\[a_{ij} = P(X_t = j \mid X_{t-1} = i) = \frac{{P(X_{t-1} = i, X_t = j)}}{{P(X_{t-1} = i)}}\]
Arguments
sequence::LongNucOrView{4}: aLongNucOrView{4}object representing the DNA sequence.n::Int64=1: The order of the Markov model. That is the $\hat{A}^{n}$
Keywords
extended_alphabet::Bool=false: If true will pass the extended alphabet of DNA to search
Returns
A Matrix object representing the transition probability matrix of the sequence.
Example
seq = dna"AGCTAGCTAGCT"
tpm = transition_probability_matrix(seq)
4x4 Matrix{Float64}:
A C G T
A 0.0 0.0 1.0 0.0
C 0.0 0.0 0.0 1.0
G 0.0 1.0 0.0 0.0
T 1.0 0.0 0.0 0.0BioMarkovChains.transitions — Methodtransitions(sequence::LongSequence)Compute the transition counts of each pair in a given biological sequence sequence.
Arguments
sequence::LongSequence{DNAAlphabet{4}}: aLongSequence{DNAAlphabet{4}}object representing the DNA sequence.
Returns
A dictionary with keys being Dict{Tuple{DNA, DNA}, Int64} objects representing the dinucleotides, and values being the number of occurrences of each dinucleotide in the sequence.
Example
seq = dna"AGCTAGCTAGCT"
dinucleotides(seq)
Dict{Tuple{DNA, DNA}, Int64} with 4 entries:
(DNA_C, DNA_T) => 3
(DNA_G, DNA_C) => 3
(DNA_T, DNA_A) => 2
(DNA_A, DNA_G) => 3