Translation
Kmers can be translated using the translate function exported by BioSequences:
julia> translate(mer"UGCUUGAUC"r)
AminoAcid 3-mer:
CLISince Kmers are immutable, the in-place translate! function is not implemented for Kmers. Also, remember that Kmers are only efficient when short (at most a few hundred symbols). Hence, entire exons or genes should probably not ever be represented by a Kmer, but rather as a LongSequence or LongSubSeq from BioSequences.jl.
Reverse translation
Kmers.jl implements reverse translation, which maps an amino acid sequence to one or more RNA sequences. While this process doesn't occur naturally (as far as we know), it is still useful for some analyses.
Since genetic codes are degenerate, i.e. multiple codons code for the same amino acid, reverse translating a sequence does not return a nucleic acid sequence, but a vector of CodonSet:
Kmers.reverse_translate — Functionreverse_translate(s::Union{AminoAcid, AASeq}, code=rev_standard_genetic_code)Reverse-translates sequence or amino acid s under code::ReverseGeneticCode If s is an AminoAcid, return a CodonSet. If s is an AASeq, return Vector{CodonSet}.
Examples
julia> reverse_translate(AA_W)
CodonSet with 1 element:
UGG
julia> v = reverse_translate(aa"MMLVQ");
julia> typeof(v)
Vector{CodonSet} (alias for Array{CodonSet, 1})
julia> v[4]
CodonSet with 4 elements:
GUA
GUC
GUG
GUUSee also: reverse_translate!, ReverseGeneticCode
Kmers.CodonSet — TypeCodonSet <: AbstractSet{RNACodon}A small, immutable set of RNACodon.
Create an empty set using CodonSet(), or from an iterable of RNACodon using CodonSet(itr). Because CodonSet is immutable, use push instead of push!, and use the non-mutating set operations union, setdiff, etc.
Examples
julia> v = (mer"UAG"r, mer"GGA"r, mer"UUU"r);
julia> Set(CodonSet(v)) == Set(v)
true
julia> union(CodonSet(v), CodonSet([mer"GAG"r]))
CodonSet with 4 elements:
GAG
GGA
UAG
UUUCodonSet is an efficiently implemented AbstractSet{RNACodon} (and remember, RNACodon is an alias for RNAKmer{3, 1}).
To avoid allocating a new Vector, you can use reverse_translate!:
Kmers.reverse_translate! — Functionreverse_translate!(v::Vector{CodonSet}, s::AASeq, code=rev_standard_genetic_code) -> vReverse-translates s under the reverse genetic code code, putting the result in v.
See also: reverse_translate
Examples:
julia> v = CodonSet[];
julia> reverse_translate!(v, aa"KWCL")
4-element Vector{CodonSet}:
CodonSet(0x0000000000000005)
CodonSet(0x0400000000000000)
CodonSet(0x0a00000000000000)
CodonSet(0x50000000f0000000)Both functions take a genetic code as a keyword argument of the type ReverseGeneticCode. This object determines the mapping from amino acid to CodonSet - by default the standard genetic code is used - this mapping is used by nearly all organisms.
Only the reverse standard genetic code is defined in Kmers.jl. To use another genetic code, build a ReverseGeneticCode object from an existing BioSequences.GeneticCode:
julia> code = BioSequences.pterobrachia_mitochondrial_genetic_code;
julia> rv_code = ReverseGeneticCode(code);
julia> seq = aa"KWLP";
julia> codonsets = reverse_translate(seq, rv_code)
4-element Vector{CodonSet}:
CodonSet(0x0000000000000405)
CodonSet(0x0500000000000000)
CodonSet(0x50000000f0000000)
CodonSet(0x0000000000f00000)
julia> codonsets == reverse_translate(seq) # default standard code
falseKmers.ReverseGeneticCode — TypeReverseGeneticCode <: AbstractDict{AminoAcid, CodonSet}A mapping from an amino acid aa to the CodonSet of all codons that translate to aa. Conceptually, the inverse of a BioSequences.GeneticCode. Used by reverse_translate.
AA_Gap cannot be translated. Ambiguous amino acids translate to the union of what their constituent amino acids translate to. Pyrrolysine and selenocysteine translate to CodonSet containing UAG and UGA, respectively, whereas they are not translated to in most forward genetic codes. For these reasons, a the mapping through ReverseGeneticCode is not exactly inverse of the mapping through GeneticCode
Examples
julia> code = ReverseGeneticCode(BioSequences.candidate_division_sr1_genetic_code);
julia> code[AA_E]
CodonSet with 2 elements:
GAA
GAG
julia> code[AA_Gap]
ERROR: Cannot reverse translate element: -
[...]See also: reverse_translate