API Reference
Operations
BioAlignments.Operation
— TypeAlignment operation.
BioAlignments.OP_MATCH
— Constant'M'
: non-specific match
BioAlignments.OP_INSERT
— Constant'I'
: insertion into reference sequence
BioAlignments.OP_DELETE
— Constant'D'
: deletion from reference sequence
BioAlignments.OP_SKIP
— Constant'N'
: (typically long) deletion from the reference, e.g. due to RNA splicing
BioAlignments.OP_SOFT_CLIP
— Constant'S'
: sequence removed from the beginning or end of the query sequence but stored
BioAlignments.OP_HARD_CLIP
— Constant'H'
: sequence removed from the beginning or end of the query sequence and not stored
BioAlignments.OP_PAD
— Constant'P'
: silent deletion from padded reference (not present in query or reference)
BioAlignments.OP_SEQ_MATCH
— Constant'='
: match operation with matching sequence positions
BioAlignments.OP_SEQ_MISMATCH
— Constant'X'
: match operation with mismatching sequence positions
BioAlignments.OP_BACK
— Constant'B'
: not currently supported, but present for SAM/BAM compatibility
BioAlignments.OP_START
— Constant'0'
: indicate the start of an alignment within the reference and query sequence
BioAlignments.ismatchop
— Functionismatchop(op::Operation)
Test if op
is a match operation (i.e. op ∈ (OP_MATCH, OP_SEQ_MATCH, OP_SEQ_MISMATCH)
).
BioAlignments.isinsertop
— Functionisinsertop(op::Operation)
Test if op
is a insertion operation (i.e. op ∈ (OP_INSERT, OP_SOFT_CLIP)
).
BioAlignments.isdeleteop
— Functionisdeleteop(op::Operation)
Test if op
is a deletion operation (i.e. op ∈ (OP_DELETE, OP_SKIP)
).
Alignments
BioAlignments.AlignmentAnchor
— TypeAlignment operation with anchoring positions.
BioAlignments.Alignment
— TypeDefines how to align a given sequence onto a reference sequence. The alignment is represented as a sequence of elementary operations (match, insertion, deletion etc) anchored to specific positions of the input and reference sequence.
BioAlignments.Alignment
— MethodAlignment(anchors::Vector{AlignmentAnchor}, check=true)
Create an alignment object from a sequence of alignment anchors.
BioAlignments.Alignment
— MethodAlignment(cigar::AbstractString, seqpos=1, refpos=1)
Make an alignment object from a CIGAR string.
seqpos
and refpos
specify the starting positions of two sequences.
BioAlignments.seq2ref
— Methodseq2ref(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}
Map a position i
from sequence to reference.
BioAlignments.ref2seq
— Methodref2seq(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}
Map a position i
from reference to sequence.
BioAlignments.cigar
— Methodcigar(aln::Alignment)
Make a CIGAR string encoding of aln
.
This is not entirely lossless as it discards the alignments start positions.
Substitution matrices
BioAlignments.AbstractSubstitutionMatrix
— TypeSupertype of substitution matrix.
The required method:
Base.getindex(submat, x, y)
: substitution score/cost fromx
toy
BioAlignments.SubstitutionMatrix
— TypeSubstitution matrix.
BioAlignments.DichotomousSubstitutionMatrix
— TypeDichotomous substitution matrix.
BioAlignments.EDNAFULL
— ConstantEDNAFULL (or NUC4.4) substitution matrix
BioAlignments.PAM30
— ConstantPAM30 substitution matrix
BioAlignments.PAM70
— ConstantPAM70 substitution matrix
BioAlignments.PAM250
— ConstantPAM250 substitution matrix
BioAlignments.BLOSUM45
— ConstantBLOSUM45 substitution matrix
BioAlignments.BLOSUM50
— ConstantBLOSUM50 substitution matrix
BioAlignments.BLOSUM62
— ConstantBLOSUM62 substitution matrix
BioAlignments.BLOSUM80
— ConstantBLOSUM80 substitution matrix
BioAlignments.BLOSUM90
— ConstantBLOSUM90 substitution matrix
BioAlignments.GRANTHAM1974
— ConstantA substitution matrix for the basic 20 amino acids based on three chemicl properties: composition, polarity, and molecular volume. Taken from R. Grantham's 1974 paper.
Pairwise alignments
BioAlignments.PairwiseAlignment
— TypePairwise alignment
Base.count
— Methodcount(aln::PairwiseAlignment, target::Operation)
Count the number of positions where the target
operation is applied.
BioAlignments.count_matches
— Functioncount_matches(aln)
Count the number of matching positions.
BioAlignments.count_mismatches
— Functioncount_mismatches(aln)
Count the number of mismatching positions.
BioAlignments.count_insertions
— Functioncount_insertions(aln)
Count the number of inserting positions.
BioAlignments.count_deletions
— Functioncount_deletions(aln)
Count the number of deleting positions.
BioAlignments.count_aligned
— Functioncount_aligned(aln)
Count the number of aligned positions.
BioAlignments.GlobalAlignment
— TypeGlobal-global alignment with end gap penalties.
BioAlignments.SemiGlobalAlignment
— TypeGlobal-local alignment.
BioAlignments.OverlapAlignment
— TypeGlobal-global alignment without end gap penalties.
BioAlignments.LocalAlignment
— TypeLocal-local alignment.
BioAlignments.EditDistance
— TypeEdit distance.
BioAlignments.HammingDistance
— TypeHamming distance.
A special case of EditDistance
with the costs of insertion and deletion are infinitely large.
BioAlignments.LevenshteinDistance
— TypeLevenshtein distance.
A special case of EditDistance
with the costs of mismatch, insertion, and deletion are 1.
BioAlignments.AbstractScoreModel
— TypeSupertype of score model.
BioAlignments.AffineGapScoreModel
— TypeAffineGapScoreModel(submat, gap_open, gap_extend)
AffineGapScoreModel(submat, gap_open=, gap_extend=)
AffineGapScoreModel(match=, mismatch=, gap_open=, gap_extend=)
Affine gap scoring model.
This creates an affine gap scroing model object for alignment from a substitution matrix (submat
), a gap opening score (gap_open
), and a gap extending score (gap_extend
). A consecutive gap of length k
has a score of gap_open + gap_extend * k
. Note that both of the gap scores should be non-positive. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, AffineGapScoreModel(match=5, mismatch=-3, gap_open=-2, gap_extend=-1)
.
Example
using BioSequences
using BioAlignments
# create an affine gap scoring model from a predefined substitution
# matrix and gap opening/extending scores.
affinegap = AffineGapScoreModel(BLOSUM62, gap_open=-10, gap_extend=-1)
# run global alignment between two amino acid sequenecs
pairalign(GlobalAlignment(), aa"IDGAAGQQL", aa"IDGATGQL", affinegap)
See also: SubstitutionMatrix
, pairalign
, CostModel
BioAlignments.AbstractCostModel
— TypeSupertype of cost model.
BioAlignments.CostModel
— TypeCostModel(submat, insertion, deletion)
CostModel(submat, insertion=, deletion=)
CostModel(match=, mismatch=, insertion=, deletion=)
Cost model.
This creates a cost model object for alignment from substitution matrix (submat
), an insertion cost (insertion
), and a deletion cost (deletion
). Note that both of the insertion and deletion costs should be non-negative. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, CostModel(match=0, mismatch=1, insertion=2, deletion=2)
.
Example
using BioAlignments
# create a cost model from a substitution matrix and indel costs
cost = CostModel(ones(128, 128) - eye(128), insertion=.5, deletion=.5)
# run global alignment to minimize edit distance
pairalign(EditDistance(), "intension", "execution", cost)
See also: SubstitutionMatrix
, pairalign
, AffineGapScoreModel
BioAlignments.PairwiseAlignmentResult
— TypeResult of pairwise alignment
BioAlignments.pairalign
— Functionpairalign(type, seq, ref, model, [options...])
Run pairwise alignment between two sequences: seq
and ref
.
Available type
s are:
GlobalAlignment()
LocalAlignment()
SemiGlobalAlignment()
OverlapAlignment()
EditDistance()
LevenshteinDistance()
HammingDistance()
GlobalAlignment
, LocalAlignment
, SemiGlobalAlignment
, and OverlapAlignment
are problem that maximizes alignment score between two sequences. Therefore, model
should be an instance of AbstractScoreModel
(e.g. AffineGapScoreModel
).
EditDistance
, LevenshteinDistance
, and HammingDistance
are problem that minimizes alignment cost between two sequences. As for EditDistance
, model
should be an instance of AbstractCostModel
(e.g. CostModel
). LevenshteinDistance
and HammingDistance
have predefined a cost model, so users cannot specify a cost model for these alignment types.
When you pass the score_only=true
or distance_only=true
option to pairalign
, the result of pairwise alignment holds alignment score/distance only. This may enable some algorithms to run faster than calculating full alignment result. Other available options
are documented for each alignemnt type.
Example
using BioSequences
using BioAlignments
# create affine gap scoring model
affinegap = AffineGapScoreModel(
match=5,
mismatch=-4,
gap_open=-5,
gap_extend=-3
)
# run global alignment between two DNA sequences
pairalign(GlobalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)
# run local alignment between two DNA sequences
pairalign(LocalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)
# you cannot specify a cost model in LevenshteinDistance
pairalign(LevenshteinDistance(), dna"AGGTAG", dna"ATTG")
See also: AffineGapScoreModel
, CostModel
BioAlignments.score
— Functionscore(alignment_result)
Return score of alignment.
BioGenerics.distance
— Functiondistance(alignment_result)
Retrun distance of alignment.
BioAlignments.alignment
— Functionalignment(aligned_sequence)
Gets the Alignment
of aligned_sequence
.
alignment(pairwise_alignment)
Gets the underlying Alignment
from pairwise_alignment
.
alignment(alignment_result)
Return the alignment if any as a PairwiseAlignment
. To get the Alignment
, nest the function, e.g. alignment(alignment(alignment_result))
. This function returns a PairwiseAlignment
instead of an Alignment
for backwards-compatibility reasons.
See also: hasalignment
BioAlignments.hasalignment
— Functionhasalignment(alignment_result)
Check if alignment is stored or not.
Missing docstring for seq2ref(::PairwiseAlignment, ::Integer)
. Check Documenter's build log for details.
Missing docstring for ref2seq(::PairwiseAlignment, ::Integer)
. Check Documenter's build log for details.