Indexing & modifying kmers

Indexing

As BioSequence concrete subtypes, kmers can be indexed using integers

julia> seq = Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C)
DNA 5-mer:
TTAGC

julia> seq[3]
DNA_A

Currently, indexing Kmers using arbitrary ranges is not implemented because it is not possible to do in a type-stable way.

Modifying sequences

Many modifying operations that are possible for some BioSequences such as LongSequence are not possible for Kmers, this is primarily due to the fact Kmers are an immutable struct.

However some non-mutating transformations are available:

BioSymbols.complementMethod
complement(seq::T) where {T<:Kmer}

Return a kmer's complement kmer.

Examples

julia> complement(Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C))
DNA 5-mer:
AATCG
source
Base.reverseMethod
reverse(seq::BioSequence)

Create reversed copy of a biological sequence.

reverse(seq::Kmer{A,K,N}) where {A,K,N}

Return a kmer that is the reverse of the input kmer.

Examples

julia> reverse(Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C))
DNA 5-mer:
CGATT
source
BioSequences.reverse_complementMethod
reverse_complement(seq::Kmer)

Return the kmer that is the reverse complement of the input kmer.

Examples

julia> reverse_complement(Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C))
DNA 5-mer:
GCTAA
source
BioSequences.canonicalFunction
canonical(seq::NucleotideSeq)

Create the canonical sequence of seq.

BioSequences.canonical(seq::Kmer{A,K,N}) where {A,K,N}

Return the canonical sequence of seq.

A canonical sequence is the numerical lesser of a kmer and its reverse complement. This is useful in hashing/counting sequences in data that is not strand specific, and thus observing the short sequence is equivalent to observing its reverse complement.

Examples

julia> canonical(Kmer(DNA_T, DNA_T, DNA_A, DNA_G, DNA_C))
DNA 5-mer:
GCTAA
source