Translation
Kmer
s can be translated using the translate
function exported by BioSequences
:
julia> translate(mer"UGCUUGAUC"r)
AminoAcid 3-mer:
CLI
Since Kmer
s are immutable, the in-place translate!
function is not implemented for Kmers
. Also, remember that Kmer
s are only efficient when short (at most a few hundred symbols). Hence, entire exons or genes should probably not ever be represented by a Kmer
, but rather as a LongSequence
or LongSubSeq
from BioSequences.jl.
Reverse translation
Kmers.jl implements reverse translation, which maps an amino acid sequence to one or more RNA sequences. While this process doesn't occur naturally (as far as we know), it is still useful for some analyses.
Since genetic codes are degenerate, i.e. multiple codons code for the same amino acid, reverse translating a sequence does not return a nucleic acid sequence, but a vector of CodonSet
:
Kmers.reverse_translate
— Functionreverse_translate(s::Union{AminoAcid, AASeq}, code=rev_standard_genetic_code)
Reverse-translates sequence or amino acid s
under code::ReverseGeneticCode
If s
is an AminoAcid
, return a CodonSet
. If s
is an AASeq
, return Vector{CodonSet}
.
Examples
julia> reverse_translate(AA_W)
CodonSet with 1 element:
UGG
julia> v = reverse_translate(aa"MMLVQ");
julia> typeof(v)
Vector{CodonSet} (alias for Array{CodonSet, 1})
julia> v[4]
CodonSet with 4 elements:
GUA
GUC
GUG
GUU
See also: reverse_translate!
, ReverseGeneticCode
Kmers.CodonSet
— TypeCodonSet <: AbstractSet{RNACodon}
A small, immutable set of RNACodon
.
Create an empty set using CodonSet()
, or from an iterable of RNACodon
using CodonSet(itr)
. Because CodonSet
is immutable, use push
instead of push!
, and use the non-mutating set operations union
, setdiff
, etc.
Examples
julia> v = (mer"UAG"r, mer"GGA"r, mer"UUU"r);
julia> Set(CodonSet(v)) == Set(v)
true
julia> union(CodonSet(v), CodonSet([mer"GAG"r]))
CodonSet with 4 elements:
GAG
GGA
UAG
UUU
CodonSet
is an efficiently implemented AbstractSet{RNACodon}
(and remember, RNACodon
is an alias for RNAKmer{3, 1}
).
To avoid allocating a new Vector
, you can use reverse_translate!
:
Kmers.reverse_translate!
— Functionreverse_translate!(v::Vector{CodonSet}, s::AASeq, code=rev_standard_genetic_code) -> v
Reverse-translates s
under the reverse genetic code code
, putting the result in v
.
See also: reverse_translate
Examples:
julia> v = CodonSet[];
julia> reverse_translate!(v, aa"KWCL")
4-element Vector{CodonSet}:
CodonSet(0x0000000000000005)
CodonSet(0x0400000000000000)
CodonSet(0x0a00000000000000)
CodonSet(0x50000000f0000000)
Both functions take a genetic code as a keyword argument of the type ReverseGeneticCode
. This object determines the mapping from amino acid to CodonSet
- by default the standard genetic code is used - this mapping is used by nearly all organisms.
Only the reverse standard genetic code is defined in Kmers.jl. To use another genetic code, build a ReverseGeneticCode
object from an existing BioSequences.GeneticCode
:
julia> code = BioSequences.pterobrachia_mitochondrial_genetic_code;
julia> rv_code = ReverseGeneticCode(code);
julia> seq = aa"KWLP";
julia> codonsets = reverse_translate(seq, rv_code)
4-element Vector{CodonSet}:
CodonSet(0x0000000000000405)
CodonSet(0x0500000000000000)
CodonSet(0x50000000f0000000)
CodonSet(0x0000000000f00000)
julia> codonsets == reverse_translate(seq) # default standard code
false
Kmers.ReverseGeneticCode
— TypeReverseGeneticCode <: AbstractDict{AminoAcid, CodonSet}
A mapping from an amino acid aa
to the CodonSet
of all codons that translate to aa
. Conceptually, the inverse of a BioSequences.GeneticCode
. Used by reverse_translate
.
AA_Gap
cannot be translated. Ambiguous amino acids translate to the union of what their constituent amino acids translate to. Pyrrolysine and selenocysteine translate to CodonSet
containing UAG
and UGA
, respectively, whereas they are not translated to in most forward genetic codes. For these reasons, a the mapping through ReverseGeneticCode
is not exactly inverse of the mapping through GeneticCode
Examples
julia> code = ReverseGeneticCode(BioSequences.candidate_division_sr1_genetic_code);
julia> code[AA_E]
CodonSet with 2 elements:
GAA
GAG
julia> code[AA_Gap]
ERROR: Cannot reverse translate element: -
[...]
See also: reverse_translate