API Reference
Operations
BioAlignments.Operation — Type.Alignment operation.
BioAlignments.OP_MATCH — Constant.'M': non-specific match
BioAlignments.OP_INSERT — Constant.'I': insertion into reference sequence
BioAlignments.OP_DELETE — Constant.'D': deletion from reference sequence
BioAlignments.OP_SKIP — Constant.'N': (typically long) deletion from the reference, e.g. due to RNA splicing
BioAlignments.OP_SOFT_CLIP — Constant.'S': sequence removed from the beginning or end of the query sequence but stored
BioAlignments.OP_HARD_CLIP — Constant.'H': sequence removed from the beginning or end of the query sequence and not stored
BioAlignments.OP_PAD — Constant.'P': not currently supported, but present for SAM/BAM compatibility
BioAlignments.OP_SEQ_MATCH — Constant.'=': match operation with matching sequence positions
BioAlignments.OP_SEQ_MISMATCH — Constant.'X': match operation with mismatching sequence positions
BioAlignments.OP_BACK — Constant.'B': not currently supported, but present for SAM/BAM compatibility
BioAlignments.OP_START — Constant.'0': indicate the start of an alignment within the reference and query sequence
BioAlignments.ismatchop — Function.ismatchop(op::Operation)Test if op is a match operation (i.e. op ∈ (OP_MATCH, OP_SEQ_MATCH, OP_SEQ_MISMATCH)).
BioAlignments.isinsertop — Function.isinsertop(op::Operation)Test if op is a insertion operation (i.e. op ∈ (OP_INSERT, OP_SOFT_CLIP, OP_HARD_CLIP)).
BioAlignments.isdeleteop — Function.isdeleteop(op::Operation)Test if op is a deletion operation (i.e. op ∈ (OP_DELETE, OP_SKIP)).
Alignments
BioAlignments.AlignmentAnchor — Type.Alignment operation with anchoring positions.
BioAlignments.Alignment — Type.Alignment of two sequences.
BioAlignments.Alignment — Method.Alignment(anchors::Vector{AlignmentAnchor}, check=true)Create an alignment object from a sequence of alignment anchors.
BioAlignments.Alignment — Method.Alignment(cigar::AbstractString, seqpos=1, refpos=1)Make an alignment object from a CIGAR string.
seqpos and refpos specify the starting positions of two sequences.
BioAlignments.seq2ref — Method.seq2ref(aln::Alignment, i::Integer)::Tuple{Int,Operation}Map a position i from sequence to reference.
BioAlignments.ref2seq — Method.ref2seq(aln::Alignment, i::Integer)::Tuple{Int,Operation}Map a position i from reference to sequence.
BioAlignments.cigar — Method.cigar(aln::Alignment)Make a CIGAR string encoding of aln.
This is not entirely lossless as it discards the alignments start positions.
Substitution matrices
Supertype of substitution matrix.
The required method:
Base.getindex(submat, x, y): substitution score/cost fromxtoy
BioAlignments.SubstitutionMatrix — Type.Substitution matrix.
Dichotomous substitution matrix.
BioAlignments.EDNAFULL — Constant.EDNAFULL (or NUC4.4) substitution matrix
BioAlignments.PAM30 — Constant.PAM30 substitution matrix
BioAlignments.PAM70 — Constant.PAM70 substitution matrix
BioAlignments.PAM250 — Constant.PAM250 substitution matrix
BioAlignments.BLOSUM45 — Constant.BLOSUM45 substitution matrix
BioAlignments.BLOSUM50 — Constant.BLOSUM50 substitution matrix
BioAlignments.BLOSUM62 — Constant.BLOSUM62 substitution matrix
BioAlignments.BLOSUM80 — Constant.BLOSUM80 substitution matrix
BioAlignments.BLOSUM90 — Constant.BLOSUM90 substitution matrix
Pairwise alignments
BioAlignments.PairwiseAlignment — Type.Pairwise alignment
Base.count — Method.count(aln::PairwiseAlignment, target::Operation)Count the number of positions where the target operation is applied.
BioAlignments.count_matches — Function.count_matches(aln)Count the number of matching positions.
BioAlignments.count_mismatches — Function.count_mismatches(aln)Count the number of mismatching positions.
BioAlignments.count_insertions — Function.count_insertions(aln)Count the number of inserting positions.
BioAlignments.count_deletions — Function.count_deletions(aln)Count the number of deleting positions.
BioAlignments.count_aligned — Function.count_aligned(aln)Count the number of aligned positions.
BioAlignments.GlobalAlignment — Type.Global-global alignment with end gap penalties.
Global-local alignment.
BioAlignments.OverlapAlignment — Type.Global-global alignment without end gap penalties.
BioAlignments.LocalAlignment — Type.Local-local alignment.
BioAlignments.EditDistance — Type.Edit distance.
BioAlignments.HammingDistance — Type.Hamming distance.
A special case of EditDistance with the costs of insertion and deletion are infinitely large.
Levenshtein distance.
A special case of EditDistance with the costs of mismatch, insertion, and deletion are 1.
BioAlignments.AbstractScoreModel — Type.Supertype of score model.
AffineGapScoreModel(submat, gap_open, gap_extend)
AffineGapScoreModel(submat, gap_open=, gap_extend=)
AffineGapScoreModel(match=, mismatch=, gap_open=, gap_extend=)Affine gap scoring model.
This creates an affine gap scroing model object for alignment from a substitution matrix (submat), a gap opening score (gap_open), and a gap extending score (gap_extend). A consecutive gap of length k has a score of gap_open + gap_extend * k. Note that both of the gap scores should be non-positive. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, AffineGapScoreModel(match=5, mismatch=-3, gap_open=-2, gap_extend=-1).
Example
using BioSequences
using BioAlignments
# create an affine gap scoring model from a predefined substitution
# matrix and gap opening/extending scores.
affinegap = AffineGapScoreModel(BLOSUM62, gap_open=-10, gap_extend=-1)
# run global alignment between two amino acid sequenecs
pairalign(GlobalAlignment(), aa"IDGAAGQQL", aa"IDGATGQL", affinegap)See also: SubstitutionMatrix, pairalign, CostModel
BioAlignments.AbstractCostModel — Type.Supertype of cost model.
BioAlignments.CostModel — Type.CostModel(submat, insertion, deletion)
CostModel(submat, insertion=, deletion=)
CostModel(match=, mismatch=, insertion=, deletion=)Cost model.
This creates a cost model object for alignment from substitution matrix (submat), an insertion cost (insertion), and a deletion cost (deletion). Note that both of the insertion and deletion costs should be non-negative. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, CostModel(match=0, mismatch=1, insertion=2, deletion=2).
Example
using BioAlignments
# create a cost model from a substitution matrix and indel costs
cost = CostModel(ones(128, 128) - eye(128), insertion=.5, deletion=.5)
# run global alignment to minimize edit distance
pairalign(EditDistance(), "intension", "execution", cost)See also: SubstitutionMatrix, pairalign, AffineGapScoreModel
Result of pairwise alignment
BioAlignments.pairalign — Function.pairalign(type, seq, ref, model, [options...])Run pairwise alignment between two sequences: seq and ref.
Available types are:
GlobalAlignment()LocalAlignment()SemiGlobalAlignment()OverlapAlignment()EditDistance()LevenshteinDistance()HammingDistance()
GlobalAlignment, LocalAlignment, SemiGlobalAlignment, and OverlapAlignment are problem that maximizes alignment score between two sequences. Therefore, model should be an instance of AbstractScoreModel (e.g. AffineGapScoreModel).
EditDistance, LevenshteinDistance, and HammingDistance are problem that minimizes alignment cost between two sequences. As for EditDistance, model should be an instance of AbstractCostModel (e.g. CostModel). LevenshteinDistance and HammingDistance have predefined a cost model, so users cannot specify a cost model for these alignment types.
When you pass the score_only=true or distance_only=true option to pairalign, the result of pairwise alignment holds alignment score/distance only. This may enable some algorithms to run faster than calculating full alignment result. Other available options are documented for each alignemnt type.
Example
using BioSequences
using BioAlignments
# create affine gap scoring model
affinegap = AffineGapScoreModel(
match=5,
mismatch=-4,
gap_open=-5,
gap_extend=-3
)
# run global alignment between two DNA sequences
pairalign(GlobalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)
# run local alignment between two DNA sequences
pairalign(LocalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)
# you cannot specify a cost model in LevenshteinDistance
pairalign(LevenshteinDistance(), dna"AGGTAG", dna"ATTG")See also: AffineGapScoreModel, CostModel
BioAlignments.score — Function.score(alignment_result)Return score of alignment.
BioCore.distance — Function.distance(alignment_result)Retrun distance of alignment.
BioAlignments.alignment — Function.alignment(alignment_result)Return alignment if any.
See also: hasalignment
BioAlignments.hasalignment — Function.hasalignment(alignment_result)Check if alignment is stored or not.
BioAlignments.seq2ref — Method.seq2ref(aln::PairwiseAlignment, i::Integer)::Tuple{Int,Operation}Map a position i from the first sequence to the second.
BioAlignments.ref2seq — Method.ref2seq(aln::PairwiseAlignment, i::Integer)::Tuple{Int,Operation}Map a position i from the second sequence to the first.
I/O
SAM
BioAlignments.SAM.Reader — Type.SAM.Reader(input::IO)Create a data reader of the SAM file format.
Arguments
input: data source
BioAlignments.SAM.header — Function.header(reader::Reader)::HeaderGet the header of reader.
BioAlignments.SAM.Header — Type.SAM.Header()Create an empty header.
Base.find — Method.find(header::Header, key::AbstractString)::Vector{MetaInfo}Find metainfo objects satisfying SAM.tag(metainfo) == key.
BioAlignments.SAM.Writer — Type.Writer(output::IO, header::Header=Header())Create a data writer of the SAM file format.
Arguments
output: data sinkheader=Header(): SAM header object
BioAlignments.SAM.MetaInfo — Type.MetaInfo(str::AbstractString)Create a SAM metainfo from str.
Examples
julia> SAM.MetaInfo("@CO some comment")
BioAlignments.SAM.MetaInfo:
tag: CO
value: some comment
julia> SAM.MetaInfo("@SQ SN:chr1 LN:12345")
BioAlignments.SAM.MetaInfo:
tag: SQ
value: SN=chr1 LN=12345MetaInfo(tag::AbstractString, value)Create a SAM metainfo with tag and value.
tag is a two-byte ASCII string. If tag is "CO", value must be a string; otherwise, value is an iterable object with key and value pairs.
Examples
julia> SAM.MetaInfo("CO", "some comment")
BioAlignments.SAM.MetaInfo:
tag: CO
value: some comment
julia> string(ans)
"@CO some comment"
julia> SAM.MetaInfo("SQ", ["SN" => "chr1", "LN" => 12345])
BioAlignments.SAM.MetaInfo:
tag: SQ
value: SN=chr1 LN=12345
julia> string(ans)
"@SQ SN:chr1 LN:12345"BioAlignments.SAM.iscomment — Function.iscomment(metainfo::MetaInfo)::BoolTest if metainfo is a comment (i.e. its tag is "CO").
BioAlignments.SAM.tag — Function.tag(metainfo::MetaInfo)::StringGet the tag of metainfo.
BioAlignments.SAM.value — Function.value(metainfo::MetaInfo)::StringGet the value of metainfo as a string.
BioAlignments.SAM.keyvalues — Function.keyvalues(metainfo::MetaInfo)::Vector{Pair{String,String}}Get the values of metainfo as string pairs.
BioAlignments.SAM.Record — Type.SAM.Record()Create an unfilled SAM record.
SAM.Record(data::Vector{UInt8})Create a SAM record from data. This function verifies the format and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.
SAM.Record(str::AbstractString)Create a SAM record from str. This function verifies the format and indexes fields for accessors.
BioAlignments.SAM.flag — Function.flag(record::Record)::UInt16Get the bitwise flag of record.
BioAlignments.SAM.ismapped — Function.ismapped(record::Record)::BoolTest if record is mapped.
BioAlignments.SAM.isprimary — Function.isprimary(record::Record)::BoolTest if record is a primary line of the read.
This is equivalent to flag(record) & 0x900 == 0.
BioAlignments.SAM.refname — Function.refname(record::Record)::StringGet the reference sequence name of record.
BioAlignments.SAM.position — Function.position(record::Record)::IntGet the 1-based leftmost mapping position of record.
BioAlignments.SAM.rightposition — Function.rightposition(record::Record)::IntGet the 1-based rightmost mapping position of record.
BioAlignments.SAM.isnextmapped — Function.isnextmapped(record::Record)::BoolTest if the mate/next read of record is mapped.
BioAlignments.SAM.nextrefname — Function.nextrefname(record::Record)::StringGet the reference name of the mate/next read of record.
BioAlignments.SAM.nextposition — Function.nextposition(record::Record)::IntGet the position of the mate/next read of record.
BioAlignments.SAM.mappingquality — Function.mappingquality(record::Record)::UInt8Get the mapping quality of record.
BioAlignments.SAM.cigar — Function.cigar(record::Record)::StringGet the CIGAR string of record.
BioAlignments.SAM.alignment — Function.alignment(record::Record)::BioAlignments.AlignmentGet the alignment of record.
BioAlignments.SAM.alignlength — Function.alignlength(record::Record)::IntGet the alignment length of record.
BioAlignments.SAM.tempname — Function.tempname(record::Record)::StringGet the query template name of record.
BioAlignments.SAM.templength — Function.templength(record::Record)::IntGet the template length of record.
BioAlignments.SAM.sequence — Function.sequence(record::Record)::BioSequences.DNASequenceGet the segment sequence of record.
sequence(::Type{String}, record::Record)::StringGet the segment sequence of record as String.
BioAlignments.SAM.seqlength — Function.seqlength(record::Record)::IntGet the sequence length of record.
BioAlignments.SAM.quality — Function.quality(record::Record)::Vector{UInt8}Get the Phred-scaled base quality of record.
quality(::Type{String}, record::Record)::StringGet the ASCII-encoded base quality of record.
BioAlignments.SAM.auxdata — Function.auxdata(record::Record)::Dict{String,Any}Get the auxiliary data (optional fields) of record.
BioAlignments.SAM.FLAG_PAIRED — Constant.0x0001: the read is paired in sequencing, no matter whether it is mapped in a pair
BioAlignments.SAM.FLAG_PROPER_PAIR — Constant.0x0002: the read is mapped in a proper pair
BioAlignments.SAM.FLAG_UNMAP — Constant.0x0004: the read itself is unmapped; conflictive with SAM.FLAG_PROPER_PAIR
BioAlignments.SAM.FLAG_MUNMAP — Constant.0x0008: the mate is unmapped
BioAlignments.SAM.FLAG_REVERSE — Constant.0x0010: the read is mapped to the reverse strand
BioAlignments.SAM.FLAG_MREVERSE — Constant.0x0020: the mate is mapped to the reverse strand
BioAlignments.SAM.FLAG_READ1 — Constant.0x0040: this is read1
BioAlignments.SAM.FLAG_READ2 — Constant.0x0080: this is read2
BioAlignments.SAM.FLAG_SECONDARY — Constant.0x0100: not primary alignment
BioAlignments.SAM.FLAG_QCFAIL — Constant.0x0200: QC failure
BioAlignments.SAM.FLAG_DUP — Constant.0x0400: optical or PCR duplicate
BioAlignments.SAM.FLAG_SUPPLEMENTARY — Constant.0x0800: supplementary alignment
BAM
BioAlignments.BAM.Reader — Type.BAM.Reader(input::IO; index=nothing)Create a data reader of the BAM file format.
Arguments
input: data sourceindex=nothing: filepath to a random access index (currently bai is supported)
BioAlignments.BAM.header — Function.header(reader::Reader; fillSQ::Bool=false)::SAM.HeaderGet the header of reader.
If fillSQ is true, this function fills missing "SQ" metainfo in the header.
BioAlignments.BAM.Writer — Type.BAM.Writer(output::BGZFStream, header::SAM.Header)Create a data writer of the BAM file format.
Arguments
output: data sinkheader: SAM header object
BioAlignments.BAM.Record — Type.BAM.Record()Create an unfilled BAM record.
BioAlignments.BAM.flag — Function.flag(record::Record)::UInt16Get the bitwise flag of record.
BioAlignments.BAM.ismapped — Function.ismapped(record::Record)::BoolTest if record is mapped.
BioAlignments.BAM.isprimary — Function.isprimary(record::Record)::BoolTest if record is a primary line of the read.
This is equivalent to flag(record) & 0x900 == 0.
BioAlignments.BAM.ispositivestrand — Function.ispositivestrand(record::Record)::BoolTest if record is aligned to the positive strand.
This is equivalent to flag(record) & 0x10 == 0.
BioAlignments.BAM.refid — Function.refid(record::Record)::IntGet the reference sequence ID of record.
The ID is 1-based (i.e. the first sequence is 1) and is 0 for a record without a mapping position.
See also: BAM.rname
BioAlignments.BAM.refname — Function.refname(record::Record)::StringGet the reference sequence name of record.
See also: BAM.refid
BioAlignments.BAM.position — Function.position(record::Record)::IntGet the 1-based leftmost mapping position of record.
BioAlignments.BAM.rightposition — Function.rightposition(record::Record)::IntGet the 1-based rightmost mapping position of record.
BioAlignments.BAM.isnextmapped — Function.isnextmapped(record::Record)::BoolTest if the mate/next read of record is mapped.
BioAlignments.BAM.nextrefid — Function.nextrefid(record::Record)::IntGet the next/mate reference sequence ID of record.
BioAlignments.BAM.nextrefname — Function.nextrefname(record::Record)::StringGet the reference name of the mate/next read of record.
BioAlignments.BAM.nextposition — Function.nextposition(record::Record)::IntGet the 1-based leftmost mapping position of the next/mate read of record.
BioAlignments.BAM.mappingquality — Function.mappingquality(record::Record)::UInt8Get the mapping quality of record.
BioAlignments.BAM.cigar — Function.cigar(record::Record)::StringGet the CIGAR string of record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
See also BAM.cigar_rle.
BioAlignments.BAM.cigar_rle — Function.cigar_rle(record::Record, checkCG::Bool = true)::Tuple{Vector{BioAlignments.Operation},Vector{Int}}Get a run-length encoded tuple (ops, lens) of the CIGAR string in record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
See also BAM.cigar.
BioAlignments.BAM.alignment — Function.alignment(record::Record)::BioAlignments.AlignmentGet the alignment of record.
BioAlignments.BAM.alignlength — Function.alignlength(record::Record)::IntGet the alignment length of record.
BioAlignments.BAM.tempname — Function.tempname(record::Record)::StringGet the query template name of record.
BioAlignments.BAM.templength — Function.templength(record::Record)::IntGet the template length of record.
BioAlignments.BAM.sequence — Function.sequence(record::Record)::BioSequences.DNASequenceGet the segment sequence of record.
BioAlignments.BAM.seqlength — Function.seqlength(record::Record)::IntGet the sequence length of record.
BioAlignments.BAM.quality — Function.quality(record::Record)::Vector{UInt8}Get the base quality of record.
BioAlignments.BAM.auxdata — Function.auxdata(record::Record)::BAM.AuxDataGet the auxiliary data of record.
BioAlignments.BAM.BAI — Type.BAI(filename::AbstractString)Load a BAI index from filename.
BAI(input::IO)Load a BAI index from input.