API Reference
Operations
BioAlignments.Operation — Type
Alignment operation.
BioAlignments.OP_MATCH — Constant
'M': non-specific match
BioAlignments.OP_INSERT — Constant
'I': insertion into reference sequence
BioAlignments.OP_DELETE — Constant
'D': deletion from reference sequence
BioAlignments.OP_SKIP — Constant
'N': (typically long) deletion from the reference, e.g. due to RNA splicing
BioAlignments.OP_SOFT_CLIP — Constant
'S': sequence removed from the beginning or end of the query sequence but stored
BioAlignments.OP_HARD_CLIP — Constant
'H': sequence removed from the beginning or end of the query sequence and not stored
BioAlignments.OP_PAD — Constant
'P': silent deletion from padded reference (not present in query or reference)
BioAlignments.OP_SEQ_MATCH — Constant
'=': match operation with matching sequence positions
BioAlignments.OP_SEQ_MISMATCH — Constant
'X': match operation with mismatching sequence positions
BioAlignments.OP_BACK — Constant
'B': not currently supported, but present for SAM/BAM compatibility
BioAlignments.OP_START — Constant
'0': indicate the start of an alignment within the reference and query sequence
BioAlignments.ismatchop — Function
ismatchop(op::Operation)Test if op is a match operation (i.e. op ∈ (OP_MATCH, OP_SEQ_MATCH, OP_SEQ_MISMATCH)).
BioAlignments.isinsertop — Function
isinsertop(op::Operation)Test if op is a insertion operation (i.e. op ∈ (OP_INSERT, OP_SOFT_CLIP)).
BioAlignments.isdeleteop — Function
isdeleteop(op::Operation)Test if op is a deletion operation (i.e. op ∈ (OP_DELETE, OP_SKIP)).
BioAlignments.ismetaop — Method
ismetaop(op::Operation)Test if op is a meta operation and does not consume reference or sequence bases (i.e. op ∈ (OP_PAD, OP_HARD_CLIP)).
Alignments
BioAlignments.AlignmentAnchor — Type
Alignment operation with anchoring positions.
BioAlignments.Alignment — Type
Defines how to align a given sequence onto a reference sequence. The alignment is represented as a sequence of elementary operations (match, insertion, deletion etc) anchored to specific positions of the input and reference sequence.
BioAlignments.Alignment — Method
Alignment(anchors::Vector{AlignmentAnchor}, check=true)Create an alignment object from a sequence of alignment anchors.
BioAlignments.Alignment — Method
Alignment(cigar::AbstractString, seqpos=1, refpos=1)Make an alignment object from a CIGAR string.
seqpos and refpos specify the starting positions of two sequences.
BioAlignments.seq2ref — Method
seq2ref(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}Map a position i from sequence to reference.
BioAlignments.ref2seq — Method
ref2seq(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}Map a position i from reference to sequence.
BioAlignments.ref2aln — Method
ref2aln(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}Map a position i from the reference sequence to the alignment sequence.
BioAlignments.seq2aln — Method
seq2aln(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}Map a position i from the input sequence to the alignment sequence.
BioAlignments.aln2ref — Method
aln2ref(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}Map a position i from the alignment sequence to the reference sequence.
BioAlignments.aln2seq — Method
aln2seq(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}Map a position i from the alignment sequence to the input sequence.
BioAlignments.cigar — Method
cigar(aln::Alignment)Make a CIGAR string encoding of aln.
This is not entirely lossless as it discards the alignments start positions.
Substitution matrices
BioAlignments.AbstractSubstitutionMatrix — Type
Supertype of substitution matrix.
The required method:
Base.getindex(submat, x, y): substitution score/cost fromxtoy
BioAlignments.SubstitutionMatrix — Type
Substitution matrix.
BioAlignments.DichotomousSubstitutionMatrix — Type
Dichotomous substitution matrix.
BioAlignments.EDNAFULL — Constant
EDNAFULL (or NUC4.4) substitution matrix
BioAlignments.PAM30 — Constant
PAM30 substitution matrix
BioAlignments.PAM70 — Constant
PAM70 substitution matrix
BioAlignments.PAM250 — Constant
PAM250 substitution matrix
BioAlignments.BLOSUM45 — Constant
BLOSUM45 substitution matrix
BioAlignments.BLOSUM50 — Constant
BLOSUM50 substitution matrix
BioAlignments.BLOSUM62 — Constant
BLOSUM62 substitution matrix
BioAlignments.BLOSUM80 — Constant
BLOSUM80 substitution matrix
BioAlignments.BLOSUM90 — Constant
BLOSUM90 substitution matrix
BioAlignments.GRANTHAM1974 — Constant
A substitution matrix for the basic 20 amino acids based on three chemicl properties: composition, polarity, and molecular volume. Taken from R. Grantham's 1974 paper.
Pairwise alignments
BioAlignments.PairwiseAlignment — Type
Pairwise alignment
Base.count — Method
count(aln::PairwiseAlignment, target::Operation)Count the number of positions where the target operation is applied.
BioAlignments.count_matches — Function
count_matches(aln)Count the number of matching positions.
BioAlignments.count_mismatches — Function
count_mismatches(aln)Count the number of mismatching positions.
BioAlignments.count_insertions — Function
count_insertions(aln)Count the number of inserting positions.
BioAlignments.count_deletions — Function
count_deletions(aln)Count the number of deleting positions.
BioAlignments.count_aligned — Function
count_aligned(aln)Count the number of aligned positions.
BioAlignments.GlobalAlignment — Type
Global-global alignment with end gap penalties.
BioAlignments.SemiGlobalAlignment — Type
Global-local alignment.
BioAlignments.OverlapAlignment — Type
Global-global alignment without end gap penalties.
BioAlignments.LocalAlignment — Type
Local-local alignment.
BioAlignments.EditDistance — Type
Edit distance.
BioAlignments.HammingDistance — Type
Hamming distance.
A special case of EditDistance with the costs of insertion and deletion are infinitely large.
BioAlignments.LevenshteinDistance — Type
Levenshtein distance.
A special case of EditDistance with the costs of mismatch, insertion, and deletion are 1.
BioAlignments.AbstractScoreModel — Type
Supertype of score model.
BioAlignments.AffineGapScoreModel — Type
AffineGapScoreModel(submat, gap_open, gap_extend)
AffineGapScoreModel(submat, gap_open=, gap_extend=)
AffineGapScoreModel(match=, mismatch=, gap_open=, gap_extend=)Affine gap scoring model.
This creates an affine gap scroing model object for alignment from a substitution matrix (submat), a gap opening score (gap_open), and a gap extending score (gap_extend). A consecutive gap of length k has a score of gap_open + gap_extend * k. Note that both of the gap scores should be non-positive. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, AffineGapScoreModel(match=5, mismatch=-3, gap_open=-2, gap_extend=-1).
Example
using BioSequences
using BioAlignments
# create an affine gap scoring model from a predefined substitution
# matrix and gap opening/extending scores.
affinegap = AffineGapScoreModel(BLOSUM62, gap_open=-10, gap_extend=-1)
# run global alignment between two amino acid sequenecs
pairalign(GlobalAlignment(), aa"IDGAAGQQL", aa"IDGATGQL", affinegap)See also: SubstitutionMatrix, pairalign, CostModel
BioAlignments.AbstractCostModel — Type
Supertype of cost model.
BioAlignments.CostModel — Type
CostModel(submat, insertion, deletion)
CostModel(submat, insertion=, deletion=)
CostModel(match=, mismatch=, insertion=, deletion=)Cost model.
This creates a cost model object for alignment from substitution matrix (submat), an insertion cost (insertion), and a deletion cost (deletion). Note that both of the insertion and deletion costs should be non-negative. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, CostModel(match=0, mismatch=1, insertion=2, deletion=2).
Example
using BioAlignments
# create a cost model from a substitution matrix and indel costs
cost = CostModel(ones(128, 128) - eye(128), insertion=.5, deletion=.5)
# run global alignment to minimize edit distance
pairalign(EditDistance(), "intension", "execution", cost)See also: SubstitutionMatrix, pairalign, AffineGapScoreModel
BioAlignments.PairwiseAlignmentResult — Type
Result of pairwise alignment
BioAlignments.pairalign — Function
pairalign(type, seq, ref, model, [options...])Run pairwise alignment between two sequences: seq and ref.
Available types are:
GlobalAlignment()LocalAlignment()SemiGlobalAlignment()OverlapAlignment()EditDistance()LevenshteinDistance()HammingDistance()
GlobalAlignment, LocalAlignment, SemiGlobalAlignment, and OverlapAlignment are problem that maximizes alignment score between two sequences. Therefore, model should be an instance of AbstractScoreModel (e.g. AffineGapScoreModel).
EditDistance, LevenshteinDistance, and HammingDistance are problem that minimizes alignment cost between two sequences. As for EditDistance, model should be an instance of AbstractCostModel (e.g. CostModel). LevenshteinDistance and HammingDistance have predefined a cost model, so users cannot specify a cost model for these alignment types.
When you pass the score_only=true or distance_only=true option to pairalign, the result of pairwise alignment holds alignment score/distance only. This may enable some algorithms to run faster than calculating full alignment result. Other available options are documented for each alignemnt type.
Example
using BioSequences
using BioAlignments
# create affine gap scoring model
affinegap = AffineGapScoreModel(
match=5,
mismatch=-4,
gap_open=-5,
gap_extend=-3
)
# run global alignment between two DNA sequences
pairalign(GlobalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)
# run local alignment between two DNA sequences
pairalign(LocalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)
# you cannot specify a cost model in LevenshteinDistance
pairalign(LevenshteinDistance(), dna"AGGTAG", dna"ATTG")See also: AffineGapScoreModel, CostModel
BioAlignments.score — Function
score(alignment_result)Return score of alignment.
BioGenerics.distance — Function
distance(alignment_result)Retrun distance of alignment.
BioAlignments.alignment — Function
alignment(aligned_sequence)Gets the Alignment of aligned_sequence.
alignment(pairwise_alignment)Gets the underlying Alignment from pairwise_alignment.
alignment(alignment_result)Return the alignment if any as a PairwiseAlignment. To get the Alignment, nest the function, e.g. alignment(alignment(alignment_result)). This function returns a PairwiseAlignment instead of an Alignment for backwards-compatibility reasons.
See also: hasalignment
BioAlignments.hasalignment — Function
hasalignment(alignment_result)Check if alignment is stored or not.
BioAlignments.sequence — Method
sequence(alignment_result)Return the query sequence of alignment_result, if it exists.
BioAlignments.sequence — Method
sequence(pairwise_alignment)Gets the query sequence of pairwise_alignment.
BioAlignments.sequence — Method
sequence(aligned_sequence)Return the sequence of aligned_sequence.