API Reference

API Reference

Operations

Alignment operation.

source

'M': non-specific match

source

'I': insertion into reference sequence

source

'D': deletion from reference sequence

source
BioAlignments.OP_SKIPConstant.

'N': (typically long) deletion from the reference, e.g. due to RNA splicing

source

'S': sequence removed from the beginning or end of the query sequence but stored

source

'H': sequence removed from the beginning or end of the query sequence and not stored

source
BioAlignments.OP_PADConstant.

'P': not currently supported, but present for SAM/BAM compatibility

source

'=': match operation with matching sequence positions

source

'X': match operation with mismatching sequence positions

source
BioAlignments.OP_BACKConstant.

'B': not currently supported, but present for SAM/BAM compatibility

source

'0': indicate the start of an alignment within the reference and query sequence

source
ismatchop(op::Operation)

Test if op is a match operation (i.e. op ∈ (OP_MATCH, OP_SEQ_MATCH, OP_SEQ_MISMATCH)).

source
isinsertop(op::Operation)

Test if op is a insertion operation (i.e. op ∈ (OP_INSERT, OP_SOFT_CLIP, OP_HARD_CLIP)).

source
isdeleteop(op::Operation)

Test if op is a deletion operation (i.e. op ∈ (OP_DELETE, OP_SKIP)).

source

Alignments

Alignment operation with anchoring positions.

source

Alignment of two sequences.

source
Alignment(anchors::Vector{AlignmentAnchor}, check=true)

Create an alignment object from a sequence of alignment anchors.

source
Alignment(cigar::AbstractString, seqpos=1, refpos=1)

Make an alignment object from a CIGAR string.

seqpos and refpos specify the starting positions of two sequences.

source
seq2ref(aln::Alignment, i::Integer)::Tuple{Int,Operation}

Map a position i from sequence to reference.

source
ref2seq(aln::Alignment, i::Integer)::Tuple{Int,Operation}

Map a position i from reference to sequence.

source
cigar(aln::Alignment)

Make a CIGAR string encoding of aln.

This is not entirely lossless as it discards the alignments start positions.

source

Substitution matrices

Supertype of substitution matrix.

The required method:

  • Base.getindex(submat, x, y): substitution score/cost from x to y

source

Substitution matrix.

source

Dichotomous substitution matrix.

source

EDNAFULL (or NUC4.4) substitution matrix

source
BioAlignments.PAM30Constant.

PAM30 substitution matrix

source
BioAlignments.PAM70Constant.

PAM70 substitution matrix

source
BioAlignments.PAM250Constant.

PAM250 substitution matrix

source

BLOSUM45 substitution matrix

source

BLOSUM50 substitution matrix

source

BLOSUM62 substitution matrix

source

BLOSUM80 substitution matrix

source

BLOSUM90 substitution matrix

source

Pairwise alignments

Pairwise alignment

source
Base.countMethod.
count(aln::PairwiseAlignment, target::Operation)

Count the number of positions where the target operation is applied.

source
count_matches(aln)

Count the number of matching positions.

source
count_mismatches(aln)

Count the number of mismatching positions.

source
count_insertions(aln)

Count the number of inserting positions.

source
count_deletions(aln)

Count the number of deleting positions.

source
count_aligned(aln)

Count the number of aligned positions.

source

Global-global alignment with end gap penalties.

source

Global-local alignment.

source

Global-global alignment without end gap penalties.

source

Local-local alignment.

source

Edit distance.

source

Hamming distance.

A special case of EditDistance with the costs of insertion and deletion are infinitely large.

source

Levenshtein distance.

A special case of EditDistance with the costs of mismatch, insertion, and deletion are 1.

source

Supertype of score model.

source
AffineGapScoreModel(submat, gap_open, gap_extend)
AffineGapScoreModel(submat, gap_open=, gap_extend=)
AffineGapScoreModel(match=, mismatch=, gap_open=, gap_extend=)

Affine gap scoring model.

This creates an affine gap scroing model object for alignment from a substitution matrix (submat), a gap opening score (gap_open), and a gap extending score (gap_extend). A consecutive gap of length k has a score of gap_open + gap_extend * k. Note that both of the gap scores should be non-positive. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, AffineGapScoreModel(match=5, mismatch=-3, gap_open=-2, gap_extend=-1).

Example

using BioSequences
using BioAlignments

# create an affine gap scoring model from a predefined substitution
# matrix and gap opening/extending scores.
affinegap = AffineGapScoreModel(BLOSUM62, gap_open=-10, gap_extend=-1)

# run global alignment between two amino acid sequenecs
pairalign(GlobalAlignment(), aa"IDGAAGQQL", aa"IDGATGQL", affinegap)

See also: SubstitutionMatrix, pairalign, CostModel

source

Supertype of cost model.

source
CostModel(submat, insertion, deletion)
CostModel(submat, insertion=, deletion=)
CostModel(match=, mismatch=, insertion=, deletion=)

Cost model.

This creates a cost model object for alignment from substitution matrix (submat), an insertion cost (insertion), and a deletion cost (deletion). Note that both of the insertion and deletion costs should be non-negative. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, CostModel(match=0, mismatch=1, insertion=2, deletion=2).

Example

using BioAlignments

# create a cost model from a substitution matrix and indel costs
cost = CostModel(ones(128, 128) - eye(128), insertion=.5, deletion=.5)

# run global alignment to minimize edit distance
pairalign(EditDistance(), "intension", "execution", cost)

See also: SubstitutionMatrix, pairalign, AffineGapScoreModel

source

Result of pairwise alignment

source
pairalign(type, seq, ref, model, [options...])

Run pairwise alignment between two sequences: seq and ref.

Available types are:

  • GlobalAlignment()

  • LocalAlignment()

  • SemiGlobalAlignment()

  • OverlapAlignment()

  • EditDistance()

  • LevenshteinDistance()

  • HammingDistance()

GlobalAlignment, LocalAlignment, SemiGlobalAlignment, and OverlapAlignment are problem that maximizes alignment score between two sequences. Therefore, model should be an instance of AbstractScoreModel (e.g. AffineGapScoreModel).

EditDistance, LevenshteinDistance, and HammingDistance are problem that minimizes alignment cost between two sequences. As for EditDistance, model should be an instance of AbstractCostModel (e.g. CostModel). LevenshteinDistance and HammingDistance have predefined a cost model, so users cannot specify a cost model for these alignment types.

When you pass the score_only=true or distance_only=true option to pairalign, the result of pairwise alignment holds alignment score/distance only. This may enable some algorithms to run faster than calculating full alignment result. Other available options are documented for each alignemnt type.

Example

using BioSequences
using BioAlignments

# create affine gap scoring model
affinegap = AffineGapScoreModel(
    match=5,
    mismatch=-4,
    gap_open=-5,
    gap_extend=-3
)

# run global alignment between two DNA sequences
pairalign(GlobalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)

# run local alignment between two DNA sequences
pairalign(LocalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)

# you cannot specify a cost model in LevenshteinDistance
pairalign(LevenshteinDistance(), dna"AGGTAG", dna"ATTG")

See also: AffineGapScoreModel, CostModel

source
BioAlignments.scoreFunction.
score(alignment_result)

Return score of alignment.

source
BioCore.distanceFunction.
distance(alignment_result)

Retrun distance of alignment.

source
alignment(alignment_result)

Return alignment if any.

See also: hasalignment

source
hasalignment(alignment_result)

Check if alignment is stored or not.

source
seq2ref(aln::PairwiseAlignment, i::Integer)::Tuple{Int,Operation}

Map a position i from the first sequence to the second.

source
ref2seq(aln::PairwiseAlignment, i::Integer)::Tuple{Int,Operation}

Map a position i from the second sequence to the first.

source

I/O

SAM

SAM.Reader(input::IO)

Create a data reader of the SAM file format.

Arguments

  • input: data source

source
header(reader::Reader)::Header

Get the header of reader.

source
SAM.Header()

Create an empty header.

source
Base.findMethod.
find(header::Header, key::AbstractString)::Vector{MetaInfo}

Find metainfo objects satisfying SAM.tag(metainfo) == key.

source
Writer(output::IO, header::Header=Header())

Create a data writer of the SAM file format.

Arguments

  • output: data sink

  • header=Header(): SAM header object

source
MetaInfo(str::AbstractString)

Create a SAM metainfo from str.

Examples

julia> SAM.MetaInfo("@CO	some comment")
BioAlignments.SAM.MetaInfo:
    tag: CO
  value: some comment

julia> SAM.MetaInfo("@SQ	SN:chr1	LN:12345")
BioAlignments.SAM.MetaInfo:
    tag: SQ
  value: SN=chr1 LN=12345
source
MetaInfo(tag::AbstractString, value)

Create a SAM metainfo with tag and value.

tag is a two-byte ASCII string. If tag is "CO", value must be a string; otherwise, value is an iterable object with key and value pairs.

Examples

julia> SAM.MetaInfo("CO", "some comment")
BioAlignments.SAM.MetaInfo:
    tag: CO
  value: some comment

julia> string(ans)
"@CO	some comment"

julia> SAM.MetaInfo("SQ", ["SN" => "chr1", "LN" => 12345])
BioAlignments.SAM.MetaInfo:
    tag: SQ
  value: SN=chr1 LN=12345

julia> string(ans)
"@SQ	SN:chr1	LN:12345"
source
iscomment(metainfo::MetaInfo)::Bool

Test if metainfo is a comment (i.e. its tag is "CO").

source
BioAlignments.SAM.tagFunction.
tag(metainfo::MetaInfo)::String

Get the tag of metainfo.

source
value(metainfo::MetaInfo)::String

Get the value of metainfo as a string.

source
keyvalues(metainfo::MetaInfo)::Vector{Pair{String,String}}

Get the values of metainfo as string pairs.

source
SAM.Record()

Create an unfilled SAM record.

source
SAM.Record(data::Vector{UInt8})

Create a SAM record from data. This function verifies the format and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.

source
SAM.Record(str::AbstractString)

Create a SAM record from str. This function verifies the format and indexes fields for accessors.

source
flag(record::Record)::UInt16

Get the bitwise flag of record.

source
ismapped(record::Record)::Bool

Test if record is mapped.

source
isprimary(record::Record)::Bool

Test if record is a primary line of the read.

This is equivalent to flag(record) & 0x900 == 0.

source
refname(record::Record)::String

Get the reference sequence name of record.

source
position(record::Record)::Int

Get the 1-based leftmost mapping position of record.

source
rightposition(record::Record)::Int

Get the 1-based rightmost mapping position of record.

source
isnextmapped(record::Record)::Bool

Test if the mate/next read of record is mapped.

source
nextrefname(record::Record)::String

Get the reference name of the mate/next read of record.

source
nextposition(record::Record)::Int

Get the position of the mate/next read of record.

source
mappingquality(record::Record)::UInt8

Get the mapping quality of record.

source
cigar(record::Record)::String

Get the CIGAR string of record.

source
alignment(record::Record)::BioAlignments.Alignment

Get the alignment of record.

source
alignlength(record::Record)::Int

Get the alignment length of record.

source
tempname(record::Record)::String

Get the query template name of record.

source
templength(record::Record)::Int

Get the template length of record.

source
sequence(record::Record)::BioSequences.DNASequence

Get the segment sequence of record.

source
sequence(::Type{String}, record::Record)::String

Get the segment sequence of record as String.

source
seqlength(record::Record)::Int

Get the sequence length of record.

source
quality(record::Record)::Vector{UInt8}

Get the Phred-scaled base quality of record.

source
quality(::Type{String}, record::Record)::String

Get the ASCII-encoded base quality of record.

source
auxdata(record::Record)::Dict{String,Any}

Get the auxiliary data (optional fields) of record.

source

0x0001: the read is paired in sequencing, no matter whether it is mapped in a pair

source

0x0002: the read is mapped in a proper pair

source

0x0004: the read itself is unmapped; conflictive with SAM.FLAG_PROPER_PAIR

source

0x0008: the mate is unmapped

source

0x0010: the read is mapped to the reverse strand

source

0x0020: the mate is mapped to the reverse strand

source

0x0040: this is read1

source

0x0080: this is read2

source

0x0100: not primary alignment

source

0x0200: QC failure

source

0x0400: optical or PCR duplicate

source

0x0800: supplementary alignment

source

BAM

BAM.Reader(input::IO; index=nothing)

Create a data reader of the BAM file format.

Arguments

  • input: data source

  • index=nothing: filepath to a random access index (currently bai is supported)

source
header(reader::Reader; fillSQ::Bool=false)::SAM.Header

Get the header of reader.

If fillSQ is true, this function fills missing "SQ" metainfo in the header.

source
BAM.Writer(output::BGZFStream, header::SAM.Header)

Create a data writer of the BAM file format.

Arguments

  • output: data sink

  • header: SAM header object

source
BAM.Record()

Create an unfilled BAM record.

source
flag(record::Record)::UInt16

Get the bitwise flag of record.

source
ismapped(record::Record)::Bool

Test if record is mapped.

source
isprimary(record::Record)::Bool

Test if record is a primary line of the read.

This is equivalent to flag(record) & 0x900 == 0.

source
ispositivestrand(record::Record)::Bool

Test if record is aligned to the positive strand.

This is equivalent to flag(record) & 0x10 == 0.

source
refid(record::Record)::Int

Get the reference sequence ID of record.

The ID is 1-based (i.e. the first sequence is 1) and is 0 for a record without a mapping position.

See also: BAM.rname

source
refname(record::Record)::String

Get the reference sequence name of record.

See also: BAM.refid

source
position(record::Record)::Int

Get the 1-based leftmost mapping position of record.

source
rightposition(record::Record)::Int

Get the 1-based rightmost mapping position of record.

source
isnextmapped(record::Record)::Bool

Test if the mate/next read of record is mapped.

source
nextrefid(record::Record)::Int

Get the next/mate reference sequence ID of record.

source
nextrefname(record::Record)::String

Get the reference name of the mate/next read of record.

source
nextposition(record::Record)::Int

Get the 1-based leftmost mapping position of the next/mate read of record.

source
mappingquality(record::Record)::UInt8

Get the mapping quality of record.

source
cigar(record::Record)::String

Get the CIGAR string of record.

Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.

Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.

If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.

See also BAM.cigar_rle.

source
cigar_rle(record::Record, checkCG::Bool = true)::Tuple{Vector{BioAlignments.Operation},Vector{Int}}

Get a run-length encoded tuple (ops, lens) of the CIGAR string in record.

Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.

Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.

If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.

See also BAM.cigar.

source
alignment(record::Record)::BioAlignments.Alignment

Get the alignment of record.

source
alignlength(record::Record)::Int

Get the alignment length of record.

source
tempname(record::Record)::String

Get the query template name of record.

source
templength(record::Record)::Int

Get the template length of record.

source
sequence(record::Record)::BioSequences.DNASequence

Get the segment sequence of record.

source
seqlength(record::Record)::Int

Get the sequence length of record.

source
quality(record::Record)::Vector{UInt8}

Get the base quality of record.

source
auxdata(record::Record)::BAM.AuxData

Get the auxiliary data of record.

source
BAI(filename::AbstractString)

Load a BAI index from filename.

source
BAI(input::IO)

Load a BAI index from input.

source