API: The MerTools submodule
This is a reference of an internal sub-module's API for developers and experienced users. First ask yourself if what you need isn't covered by the higher-level WorkSpace API.
Types
GenomeGraphs.MerTools.MerCount
— TypeA simple mer count struct.
MerCount is a simple struct that binds a mer value to a count of the number of times it has been observed. This type, (sorted) vectors of them, and some additional utility methods, form the basic building blocks of the higher-level mer counting functionality of the MerTools sub-module.
The count is stored as an UInt8 because often once the count is more than 255 we hardly care anymore.
GenomeGraphs.MerTools.MerCountHist
— TypeA type for storing a frequency histogram of MerCounts, also sometimes referred to as a Kmer spectra.
GenomeGraphs.MerTools.DNAMerCount
— TypeShorthand for MerCount{DNAMer{K}}
GenomeGraphs.MerTools.RNAMerCount
— TypeShorthand for MerCount{RNAMer{K}}
Public / Safe methods
GenomeGraphs.MerTools.mer
— FunctionGet the mer from a MerCount
.
GenomeGraphs.MerTools.freq
— FunctionGet the count from a MerCount
.
Get the count from a MerCount
, and convert it to type R.
GenomeGraphs.MerTools.collapse_into_counts
— Functioncollapse_into_counts(mers::Vector{M}) where {M<:AbstractMer}
Build a vector of sorted MerCount
s from a Vector of a mer type.
This is a basic kernel function used for any higher level and more complex kmer counting procedures.
GenomeGraphs.MerTools.collapse_into_counts!
— Functioncollapse_into_counts!(result::Vector{MerCount{M}}, mers::Vector{M}) where {M<:AbstractMer}
Build a vector of sorted MerCount
s from a Vector of a mer type.
This is a basic kernel function used for any higher level and more complex kmer counting procedures.
This is like collapse_into_counts
, except it's first argument is a result
vector that is cleared and filled with the result.
The input vector mers
will be sorted by this method.
GenomeGraphs.MerTools.merge_into!
— Functionmerge_into!(a::Vector{MerCount{M}}, b::Vector{MerCount{M}}) where {M<:AbstractMer}
Merge the MerCount
s from vector b
into the vector a
.
This will sort the input vectors a
and b
.
GenomeGraphs.MerTools.build_freq_list
— Functionbuild_freq_list(::Type{M}, sbuf::SequenceBuffer{PairedReads}, range::UnitRange{Int}) where {M<:AbstractMer}
Build a sorted list (vector) of kmer counts (MerFreq), serially and in memory.
This function is a serial and in memory MerFreq
list builder that can build a kmer count from a PairedReads datastore on its own (if you have memory and time), but it is also intended to be composed into other multi-process or multi-threaded kmer counting strategies.
This method estimates roughly how many kmers will be generated by the reads specified by range
in the dataset. It then pre-allocates an array to contain them. It then collects the kmers, sorts, them, and then collapses them into a list of counts sorted by the kmer.
build_freq_list(::Type{M}, sbuf::SequenceBuffer{PairedReads}, range::UnitRange{Int}, chunk_size::Int) where {M<:AbstractMer}
Build a sorted list (vector) of kmer counts (MerFreq), serially and in memory.
This function is a serial and in memory MerFreq
list builder that can build a kmer count from a PairedReads datastore on it own (if you have memory and time), but it is also intended to be composed into other multi-process or multi-threaded kmer counting strategies.
This method pre-allocates space for chunk_size
kmers, and iterates over kmers in the reads in the dataset specified by range
until the buffer is filled. The mers are then collapsed into a list of counts, sorted by the kmer. This list is then merged into another output list. This process repeats for many chunks of kmers, building up the output list.
This method is useful for situations where you don't want (or have the space) to allocate a buffer to collect all the kmers in the dataset all in one go.
Internal / Unsafe methods
GenomeGraphs.MerTools.unsafe_collapse_into_counts!
— Functionunsafe_collapse_into_counts!(result::Vector{MerCount{M}}, mers::Vector{M}) where {M<:AbstractMer}
This method is marked as unsafe because it assumes that the mers
input vector is already sorted.
GenomeGraphs.MerTools.unsafe_merge_into!
— Functionunsafe_merge_into!(a::Vector{MerCount{M}}, b::Vector{MerCount{M}}) where {M<:AbstractMer}
Merge the MerCount
s from vector b
into the vector a
.
This method is marked as unsafe as it assumes both of the input vectors a
and b
are already sorted.