API Reference

API Reference

Operations

Alignment operation.

source

'M': non-specific match

source

'I': insertion into reference sequence

source

'D': deletion from reference sequence

source
BioAlignments.OP_SKIPConstant.

'N': (typically long) deletion from the reference, e.g. due to RNA splicing

source

'S': sequence removed from the beginning or end of the query sequence but stored

source

'H': sequence removed from the beginning or end of the query sequence and not stored

source
BioAlignments.OP_PADConstant.

'P': not currently supported, but present for SAM/BAM compatibility

source

'=': match operation with matching sequence positions

source

'X': match operation with mismatching sequence positions

source
BioAlignments.OP_BACKConstant.

'B': not currently supported, but present for SAM/BAM compatibility

source

'0': indicate the start of an alignment within the reference and query sequence

source
ismatchop(op::Operation)

Test if op is a match operation (i.e. op ∈ (OP_MATCH, OP_SEQ_MATCH, OP_SEQ_MISMATCH)).

source
isinsertop(op::Operation)

Test if op is a insertion operation (i.e. op ∈ (OP_INSERT, OP_SOFT_CLIP, OP_HARD_CLIP)).

source
isdeleteop(op::Operation)

Test if op is a deletion operation (i.e. op ∈ (OP_DELETE, OP_SKIP)).

source

Alignments

Alignment operation with anchoring positions.

source

Alignment of two sequences.

source
Alignment(anchors::Vector{AlignmentAnchor}, check=true)

Create an alignment object from a sequence of alignment anchors.

source
Alignment(cigar::AbstractString, seqpos=1, refpos=1)

Make an alignment object from a CIGAR string.

seqpos and refpos specify the starting positions of two sequences.

source
seq2ref(aln::Alignment, i::Integer)::Tuple{Int,Operation}

Map a position i from sequence to reference.

source
ref2seq(aln::Alignment, i::Integer)::Tuple{Int,Operation}

Map a position i from reference to sequence.

source
cigar(aln::Alignment)

Make a CIGAR string encoding of aln.

This is not entirely lossless as it discards the alignments start positions.

source

Substitution matrices

Supertype of substitution matrix.

The required method:

  • Base.getindex(submat, x, y): substitution score/cost from x to y
source

Substitution matrix.

source

Dichotomous substitution matrix.

source

EDNAFULL (or NUC4.4) substitution matrix

source
BioAlignments.PAM30Constant.

PAM30 substitution matrix

source
BioAlignments.PAM70Constant.

PAM70 substitution matrix

source
BioAlignments.PAM250Constant.

PAM250 substitution matrix

source

BLOSUM45 substitution matrix

source

BLOSUM50 substitution matrix

source

BLOSUM62 substitution matrix

source

BLOSUM80 substitution matrix

source

BLOSUM90 substitution matrix

source

Pairwise alignments

Pairwise alignment

source
Base.countMethod.
count(aln::PairwiseAlignment, target::Operation)

Count the number of positions where the target operation is applied.

source
count_matches(aln)

Count the number of matching positions.

source
count_mismatches(aln)

Count the number of mismatching positions.

source
count_insertions(aln)

Count the number of inserting positions.

source
count_deletions(aln)

Count the number of deleting positions.

source
count_aligned(aln)

Count the number of aligned positions.

source

Global-global alignment with end gap penalties.

source

Global-local alignment.

source

Global-global alignment without end gap penalties.

source

Local-local alignment.

source

Edit distance.

source

Hamming distance.

A special case of EditDistance with the costs of insertion and deletion are infinitely large.

source

Levenshtein distance.

A special case of EditDistance with the costs of mismatch, insertion, and deletion are 1.

source

Supertype of score model.

source
AffineGapScoreModel(submat, gap_open, gap_extend)
AffineGapScoreModel(submat, gap_open=, gap_extend=)
AffineGapScoreModel(match=, mismatch=, gap_open=, gap_extend=)

Affine gap scoring model.

This creates an affine gap scroing model object for alignment from a substitution matrix (submat), a gap opening score (gap_open), and a gap extending score (gap_extend). A consecutive gap of length k has a score of gap_open + gap_extend * k. Note that both of the gap scores should be non-positive. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, AffineGapScoreModel(match=5, mismatch=-3, gap_open=-2, gap_extend=-1).

Example

using BioSequences
using BioAlignments

# create an affine gap scoring model from a predefined substitution
# matrix and gap opening/extending scores.
affinegap = AffineGapScoreModel(BLOSUM62, gap_open=-10, gap_extend=-1)

# run global alignment between two amino acid sequenecs
pairalign(GlobalAlignment(), aa"IDGAAGQQL", aa"IDGATGQL", affinegap)

See also: SubstitutionMatrix, pairalign, CostModel

source

Supertype of cost model.

source
CostModel(submat, insertion, deletion)
CostModel(submat, insertion=, deletion=)
CostModel(match=, mismatch=, insertion=, deletion=)

Cost model.

This creates a cost model object for alignment from substitution matrix (submat), an insertion cost (insertion), and a deletion cost (deletion). Note that both of the insertion and deletion costs should be non-negative. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, CostModel(match=0, mismatch=1, insertion=2, deletion=2).

Example

using BioAlignments

# create a cost model from a substitution matrix and indel costs
cost = CostModel(ones(128, 128) - eye(128), insertion=.5, deletion=.5)

# run global alignment to minimize edit distance
pairalign(EditDistance(), "intension", "execution", cost)

See also: SubstitutionMatrix, pairalign, AffineGapScoreModel

source

Result of pairwise alignment

source
pairalign(type, seq, ref, model, [options...])

Run pairwise alignment between two sequences: seq and ref.

Available types are:

  • GlobalAlignment()
  • LocalAlignment()
  • SemiGlobalAlignment()
  • OverlapAlignment()
  • EditDistance()
  • LevenshteinDistance()
  • HammingDistance()

GlobalAlignment, LocalAlignment, SemiGlobalAlignment, and OverlapAlignment are problem that maximizes alignment score between two sequences. Therefore, model should be an instance of AbstractScoreModel (e.g. AffineGapScoreModel).

EditDistance, LevenshteinDistance, and HammingDistance are problem that minimizes alignment cost between two sequences. As for EditDistance, model should be an instance of AbstractCostModel (e.g. CostModel). LevenshteinDistance and HammingDistance have predefined a cost model, so users cannot specify a cost model for these alignment types.

When you pass the score_only=true or distance_only=true option to pairalign, the result of pairwise alignment holds alignment score/distance only. This may enable some algorithms to run faster than calculating full alignment result. Other available options are documented for each alignemnt type.

Example

using BioSequences
using BioAlignments

# create affine gap scoring model
affinegap = AffineGapScoreModel(
    match=5,
    mismatch=-4,
    gap_open=-5,
    gap_extend=-3
)

# run global alignment between two DNA sequences
pairalign(GlobalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)

# run local alignment between two DNA sequences
pairalign(LocalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)

# you cannot specify a cost model in LevenshteinDistance
pairalign(LevenshteinDistance(), dna"AGGTAG", dna"ATTG")

See also: AffineGapScoreModel, CostModel

source
BioAlignments.scoreFunction.
score(alignment_result)

Return score of alignment.

source
BioCore.distanceFunction.
distance(alignment_result)

Retrun distance of alignment.

source
alignment(alignment_result)

Return alignment if any.

See also: hasalignment

source
hasalignment(alignment_result)

Check if alignment is stored or not.

source
seq2ref(aln::PairwiseAlignment, i::Integer)::Tuple{Int,Operation}

Map a position i from the first sequence to the second.

source
ref2seq(aln::PairwiseAlignment, i::Integer)::Tuple{Int,Operation}

Map a position i from the second sequence to the first.

source

I/O

SAM

SAM.Reader
SAM.header

SAM.Header
Base.find(header::SAM.Header, key::AbstractString)

SAM.Writer

SAM.MetaInfo
SAM.iscomment
SAM.tag
SAM.value
SAM.keyvalues

SAM.Record
SAM.flag
SAM.ismapped
SAM.isprimary
SAM.refname
SAM.position
SAM.rightposition
SAM.isnextmapped
SAM.nextrefname
SAM.nextposition
SAM.mappingquality
SAM.cigar
SAM.alignment
SAM.alignlength
SAM.tempname
SAM.templength
SAM.sequence
SAM.seqlength
SAM.quality
SAM.auxdata

SAM.FLAG_PAIRED
SAM.FLAG_PROPER_PAIR
SAM.FLAG_UNMAP
SAM.FLAG_MUNMAP
SAM.FLAG_REVERSE
SAM.FLAG_MREVERSE
SAM.FLAG_READ1
SAM.FLAG_READ2
SAM.FLAG_SECONDARY
SAM.FLAG_QCFAIL
SAM.FLAG_DUP
SAM.FLAG_SUPPLEMENTARY

BAM

BAM.Reader
BAM.header

BAM.Writer

BAM.Record
BAM.flag
BAM.ismapped
BAM.isprimary
BAM.ispositivestrand
BAM.refid
BAM.refname
BAM.position
BAM.rightposition
BAM.isnextmapped
BAM.nextrefid
BAM.nextrefname
BAM.nextposition
BAM.mappingquality
BAM.cigar
BAM.cigar_rle
BAM.alignment
BAM.alignlength
BAM.tempname
BAM.templength
BAM.sequence
BAM.seqlength
BAM.quality
BAM.auxdata

BAM.BAI