Translation

Kmers can be translated using the translate function exported by BioSequences:

julia> translate(mer"UGCUUGAUC"r)
AminoAcid 3-mer:
CLI

Since Kmers are immutable, the in-place translate! function is not implemented for Kmers. Also, remember that Kmers are only efficient when short (at most a few hundred symbols). Hence, entire exons or genes should probably not ever be represented by a Kmer, but rather as a LongSequence or LongSubSeq from BioSequences.jl.

Reverse translation

Kmers.jl implements reverse translation, which maps an amino acid sequence to one or more RNA sequences. While this process doesn't occur naturally (as far as we know), it is still useful for some analyses.

Since genetic codes are degenerate, i.e. multiple codons code for the same amino acid, reverse translating a sequence does not return a nucleic acid sequence, but a vector of CodonSet:

Kmers.reverse_translateFunction
reverse_translate(s::Union{AminoAcid, AASeq}, code=rev_standard_genetic_code)

Reverse-translates sequence or amino acid s under code::ReverseGeneticCode If s is an AminoAcid, return a CodonSet. If s is an AASeq, return Vector{CodonSet}.

Examples

julia> reverse_translate(AA_W)
CodonSet with 1 element:
  UGG

julia> v = reverse_translate(aa"MMLVQ");

julia> typeof(v)
Vector{CodonSet} (alias for Array{CodonSet, 1})

julia> v[4]
CodonSet with 4 elements:
  GUA
  GUC
  GUG
  GUU

See also: reverse_translate!, ReverseGeneticCode

source
Kmers.CodonSetType
CodonSet <: AbstractSet{RNACodon}

A small, immutable set of RNACodon.

Create an empty set using CodonSet(), or from an iterable of RNACodon using CodonSet(itr). Because CodonSet is immutable, use push instead of push!, and use the non-mutating set operations union, setdiff, etc.

Examples

julia> v = (mer"UAG"r, mer"GGA"r, mer"UUU"r);

julia> Set(CodonSet(v)) == Set(v)
true

julia> union(CodonSet(v), CodonSet([mer"GAG"r]))
CodonSet with 4 elements:
  GAG
  GGA
  UAG
  UUU
source

CodonSet is an efficiently implemented AbstractSet{RNACodon} (and remember, RNACodon is an alias for RNAKmer{3, 1}).

To avoid allocating a new Vector, you can use reverse_translate!:

Kmers.reverse_translate!Function
reverse_translate!(v::Vector{CodonSet}, s::AASeq, code=rev_standard_genetic_code) -> v

Reverse-translates s under the reverse genetic code code, putting the result in v.

See also: reverse_translate

Examples:

julia> v = CodonSet[];

julia> reverse_translate!(v, aa"KWCL")
4-element Vector{CodonSet}:
 CodonSet(0x0000000000000005)
 CodonSet(0x0400000000000000)
 CodonSet(0x0a00000000000000)
 CodonSet(0x50000000f0000000)
source

Both functions take a genetic code as a keyword argument of the type ReverseGeneticCode. This object determines the mapping from amino acid to CodonSet - by default the standard genetic code is used - this mapping is used by nearly all organisms.

Only the reverse standard genetic code is defined in Kmers.jl. To use another genetic code, build a ReverseGeneticCode object from an existing BioSequences.GeneticCode:

julia> code = BioSequences.pterobrachia_mitochondrial_genetic_code;

julia> rv_code = ReverseGeneticCode(code);

julia> seq = aa"KWLP";

julia> codonsets = reverse_translate(seq, rv_code)
4-element Vector{CodonSet}:
 CodonSet(0x0000000000000405)
 CodonSet(0x0500000000000000)
 CodonSet(0x50000000f0000000)
 CodonSet(0x0000000000f00000)

julia> codonsets == reverse_translate(seq) # default standard code
false
Kmers.ReverseGeneticCodeType
ReverseGeneticCode <: AbstractDict{AminoAcid, CodonSet}

A mapping from an amino acid aa to the CodonSet of all codons that translate to aa. Conceptually, the inverse of a BioSequences.GeneticCode. Used by reverse_translate.

AA_Gap cannot be translated. Ambiguous amino acids translate to the union of what their constituent amino acids translate to. Pyrrolysine and selenocysteine translate to CodonSet containing UAG and UGA, respectively, whereas they are not translated to in most forward genetic codes. For these reasons, a the mapping through ReverseGeneticCode is not exactly inverse of the mapping through GeneticCode

Examples

julia> code = ReverseGeneticCode(BioSequences.candidate_division_sr1_genetic_code);

julia> code[AA_E]
CodonSet with 2 elements:
  GAA
  GAG

julia> code[AA_Gap]
ERROR: Cannot reverse translate element: -
[...]

See also: reverse_translate

source