Towards Markov Chains
DNA as a Markov chain
Several packages (e.g. MarkovChainsHammer.jl, DiscreteMarkovChains.jl, etc.) in the Julia ecosystem have been implemented to work with Markov chains with a state space of integers, those could be efficient in many ways, but they are clumsy to work with a specialized biological types as in the BioJulia
ecosystem. Therefore, in the GeneFinder
package we dedicated some implementations to work with BioSequence
types so that we can expand the functionality in an efficient way (see complete API).
One important step towards many gene finding algorithms is to represent a DNA sequence as a Markov chain. In this representation a DNA sequence of a reduced alphabet
More formally a Markov chain is a random process where each state is a random variable
where
Note that previous equations has two terms, a initial probability
We can calculate each frequency nucleotide to any other nucleotide
It is noteworthy that initial probabilities can also be obtained from the counts of each nucleotide transitions
That way for the previous example example we can can calculate the initial probabilities
References
Axelson-Fisk, Marina. 2015. Comparative Gene Finding. Vol. 20. Computational Biology. London: Springer London. http://link.springer.com/10.1007/978-1-4471-6693-1.