Indexing & modifying kmers
Indexing
As BioSequence
concrete subtypes, kmers can be indexed using integers
julia> seq = Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C)
DNA 5-mer:
TTAGC
julia> seq[3]
DNA_A
Currently, indexing Kmers using arbitrary ranges is not implemented because it is not possible to do in a type-stable way.
Modifying sequences
Many modifying operations that are possible for some BioSequences
such as LongSequence
are not possible for Kmer
s, this is primarily due to the fact Kmer
s are an immutable struct.
However some non-mutating transformations are available:
BioSymbols.complement
— Methodcomplement(seq::T) where {T<:Kmer}
Return a kmer's complement kmer.
Examples
julia> complement(Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C))
DNA 5-mer:
AATCG
Base.reverse
— Methodreverse(seq::BioSequence)
Create reversed copy of a biological sequence.
reverse(seq::Kmer{A,K,N}) where {A,K,N}
Return a kmer that is the reverse of the input kmer.
Examples
julia> reverse(Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C))
DNA 5-mer:
CGATT
BioSequences.reverse_complement
— Methodreverse_complement(seq::Kmer)
Return the kmer that is the reverse complement of the input kmer.
Examples
julia> reverse_complement(Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C))
DNA 5-mer:
GCTAA
BioSequences.canonical
— Functioncanonical(seq::NucleotideSeq)
Create the canonical sequence of seq
.
BioSequences.canonical(seq::Kmer{A,K,N}) where {A,K,N}
Return the canonical sequence of seq
.
A canonical sequence is the numerical lesser of a kmer and its reverse complement. This is useful in hashing/counting sequences in data that is not strand specific, and thus observing the short sequence is equivalent to observing its reverse complement.
Examples
julia> canonical(Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C))
DNA 5-mer:
GCTAA