API Reference
Operations
BioAlignments.Operation
— Type.Alignment operation.
BioAlignments.OP_MATCH
— Constant.'M'
: non-specific match
BioAlignments.OP_INSERT
— Constant.'I'
: insertion into reference sequence
BioAlignments.OP_DELETE
— Constant.'D'
: deletion from reference sequence
BioAlignments.OP_SKIP
— Constant.'N'
: (typically long) deletion from the reference, e.g. due to RNA splicing
BioAlignments.OP_SOFT_CLIP
— Constant.'S'
: sequence removed from the beginning or end of the query sequence but stored
BioAlignments.OP_HARD_CLIP
— Constant.'H'
: sequence removed from the beginning or end of the query sequence and not stored
BioAlignments.OP_PAD
— Constant.'P'
: not currently supported, but present for SAM/BAM compatibility
BioAlignments.OP_SEQ_MATCH
— Constant.'='
: match operation with matching sequence positions
BioAlignments.OP_SEQ_MISMATCH
— Constant.'X'
: match operation with mismatching sequence positions
BioAlignments.OP_BACK
— Constant.'B'
: not currently supported, but present for SAM/BAM compatibility
BioAlignments.OP_START
— Constant.'0'
: indicate the start of an alignment within the reference and query sequence
BioAlignments.ismatchop
— Function.ismatchop(op::Operation)
Test if op
is a match operation (i.e. op ∈ (OP_MATCH, OP_SEQ_MATCH, OP_SEQ_MISMATCH)
).
BioAlignments.isinsertop
— Function.isinsertop(op::Operation)
Test if op
is a insertion operation (i.e. op ∈ (OP_INSERT, OP_SOFT_CLIP, OP_HARD_CLIP)
).
BioAlignments.isdeleteop
— Function.isdeleteop(op::Operation)
Test if op
is a deletion operation (i.e. op ∈ (OP_DELETE, OP_SKIP)
).
Alignments
BioAlignments.AlignmentAnchor
— Type.Alignment operation with anchoring positions.
BioAlignments.Alignment
— Type.Alignment of two sequences.
BioAlignments.Alignment
— Method.Alignment(anchors::Vector{AlignmentAnchor}, check=true)
Create an alignment object from a sequence of alignment anchors.
BioAlignments.Alignment
— Method.Alignment(cigar::AbstractString, seqpos=1, refpos=1)
Make an alignment object from a CIGAR string.
seqpos
and refpos
specify the starting positions of two sequences.
BioAlignments.seq2ref
— Method.seq2ref(aln::Alignment, i::Integer)::Tuple{Int,Operation}
Map a position i
from sequence to reference.
BioAlignments.ref2seq
— Method.ref2seq(aln::Alignment, i::Integer)::Tuple{Int,Operation}
Map a position i
from reference to sequence.
BioAlignments.cigar
— Method.cigar(aln::Alignment)
Make a CIGAR string encoding of aln
.
This is not entirely lossless as it discards the alignments start positions.
Substitution matrices
Supertype of substitution matrix.
The required method:
Base.getindex(submat, x, y)
: substitution score/cost fromx
toy
BioAlignments.SubstitutionMatrix
— Type.Substitution matrix.
Dichotomous substitution matrix.
BioAlignments.EDNAFULL
— Constant.EDNAFULL (or NUC4.4) substitution matrix
BioAlignments.PAM30
— Constant.PAM30 substitution matrix
BioAlignments.PAM70
— Constant.PAM70 substitution matrix
BioAlignments.PAM250
— Constant.PAM250 substitution matrix
BioAlignments.BLOSUM45
— Constant.BLOSUM45 substitution matrix
BioAlignments.BLOSUM50
— Constant.BLOSUM50 substitution matrix
BioAlignments.BLOSUM62
— Constant.BLOSUM62 substitution matrix
BioAlignments.BLOSUM80
— Constant.BLOSUM80 substitution matrix
BioAlignments.BLOSUM90
— Constant.BLOSUM90 substitution matrix
Pairwise alignments
BioAlignments.PairwiseAlignment
— Type.Pairwise alignment
Base.count
— Method.count(aln::PairwiseAlignment, target::Operation)
Count the number of positions where the target
operation is applied.
BioAlignments.count_matches
— Function.count_matches(aln)
Count the number of matching positions.
BioAlignments.count_mismatches
— Function.count_mismatches(aln)
Count the number of mismatching positions.
BioAlignments.count_insertions
— Function.count_insertions(aln)
Count the number of inserting positions.
BioAlignments.count_deletions
— Function.count_deletions(aln)
Count the number of deleting positions.
BioAlignments.count_aligned
— Function.count_aligned(aln)
Count the number of aligned positions.
BioAlignments.GlobalAlignment
— Type.Global-global alignment with end gap penalties.
Global-local alignment.
BioAlignments.OverlapAlignment
— Type.Global-global alignment without end gap penalties.
BioAlignments.LocalAlignment
— Type.Local-local alignment.
BioAlignments.EditDistance
— Type.Edit distance.
BioAlignments.HammingDistance
— Type.Hamming distance.
A special case of EditDistance
with the costs of insertion and deletion are infinitely large.
Levenshtein distance.
A special case of EditDistance
with the costs of mismatch, insertion, and deletion are 1.
BioAlignments.AbstractScoreModel
— Type.Supertype of score model.
AffineGapScoreModel(submat, gap_open, gap_extend)
AffineGapScoreModel(submat, gap_open=, gap_extend=)
AffineGapScoreModel(match=, mismatch=, gap_open=, gap_extend=)
Affine gap scoring model.
This creates an affine gap scroing model object for alignment from a substitution matrix (submat
), a gap opening score (gap_open
), and a gap extending score (gap_extend
). A consecutive gap of length k
has a score of gap_open + gap_extend * k
. Note that both of the gap scores should be non-positive. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, AffineGapScoreModel(match=5, mismatch=-3, gap_open=-2, gap_extend=-1)
.
Example
using BioSequences
using BioAlignments
# create an affine gap scoring model from a predefined substitution
# matrix and gap opening/extending scores.
affinegap = AffineGapScoreModel(BLOSUM62, gap_open=-10, gap_extend=-1)
# run global alignment between two amino acid sequenecs
pairalign(GlobalAlignment(), aa"IDGAAGQQL", aa"IDGATGQL", affinegap)
See also: SubstitutionMatrix
, pairalign
, CostModel
BioAlignments.AbstractCostModel
— Type.Supertype of cost model.
BioAlignments.CostModel
— Type.CostModel(submat, insertion, deletion)
CostModel(submat, insertion=, deletion=)
CostModel(match=, mismatch=, insertion=, deletion=)
Cost model.
This creates a cost model object for alignment from substitution matrix (submat
), an insertion cost (insertion
), and a deletion cost (deletion
). Note that both of the insertion and deletion costs should be non-negative. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, CostModel(match=0, mismatch=1, insertion=2, deletion=2)
.
Example
using BioAlignments
# create a cost model from a substitution matrix and indel costs
cost = CostModel(ones(128, 128) - eye(128), insertion=.5, deletion=.5)
# run global alignment to minimize edit distance
pairalign(EditDistance(), "intension", "execution", cost)
See also: SubstitutionMatrix
, pairalign
, AffineGapScoreModel
Result of pairwise alignment
BioAlignments.pairalign
— Function.pairalign(type, seq, ref, model, [options...])
Run pairwise alignment between two sequences: seq
and ref
.
Available type
s are:
GlobalAlignment()
LocalAlignment()
SemiGlobalAlignment()
OverlapAlignment()
EditDistance()
LevenshteinDistance()
HammingDistance()
GlobalAlignment
, LocalAlignment
, SemiGlobalAlignment
, and OverlapAlignment
are problem that maximizes alignment score between two sequences. Therefore, model
should be an instance of AbstractScoreModel
(e.g. AffineGapScoreModel
).
EditDistance
, LevenshteinDistance
, and HammingDistance
are problem that minimizes alignment cost between two sequences. As for EditDistance
, model
should be an instance of AbstractCostModel
(e.g. CostModel
). LevenshteinDistance
and HammingDistance
have predefined a cost model, so users cannot specify a cost model for these alignment types.
When you pass the score_only=true
or distance_only=true
option to pairalign
, the result of pairwise alignment holds alignment score/distance only. This may enable some algorithms to run faster than calculating full alignment result. Other available options
are documented for each alignemnt type.
Example
using BioSequences
using BioAlignments
# create affine gap scoring model
affinegap = AffineGapScoreModel(
match=5,
mismatch=-4,
gap_open=-5,
gap_extend=-3
)
# run global alignment between two DNA sequences
pairalign(GlobalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)
# run local alignment between two DNA sequences
pairalign(LocalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)
# you cannot specify a cost model in LevenshteinDistance
pairalign(LevenshteinDistance(), dna"AGGTAG", dna"ATTG")
See also: AffineGapScoreModel
, CostModel
BioAlignments.score
— Function.score(alignment_result)
Return score of alignment.
BioCore.distance
— Function.distance(alignment_result)
Retrun distance of alignment.
BioAlignments.alignment
— Function.alignment(alignment_result)
Return alignment if any.
See also: hasalignment
BioAlignments.hasalignment
— Function.hasalignment(alignment_result)
Check if alignment is stored or not.
BioAlignments.seq2ref
— Method.seq2ref(aln::PairwiseAlignment, i::Integer)::Tuple{Int,Operation}
Map a position i
from the first sequence to the second.
BioAlignments.ref2seq
— Method.ref2seq(aln::PairwiseAlignment, i::Integer)::Tuple{Int,Operation}
Map a position i
from the second sequence to the first.
I/O
SAM
SAM.Reader
SAM.header
SAM.Header
Base.find(header::SAM.Header, key::AbstractString)
SAM.Writer
SAM.MetaInfo
SAM.iscomment
SAM.tag
SAM.value
SAM.keyvalues
SAM.Record
SAM.flag
SAM.ismapped
SAM.isprimary
SAM.refname
SAM.position
SAM.rightposition
SAM.isnextmapped
SAM.nextrefname
SAM.nextposition
SAM.mappingquality
SAM.cigar
SAM.alignment
SAM.alignlength
SAM.tempname
SAM.templength
SAM.sequence
SAM.seqlength
SAM.quality
SAM.auxdata
SAM.FLAG_PAIRED
SAM.FLAG_PROPER_PAIR
SAM.FLAG_UNMAP
SAM.FLAG_MUNMAP
SAM.FLAG_REVERSE
SAM.FLAG_MREVERSE
SAM.FLAG_READ1
SAM.FLAG_READ2
SAM.FLAG_SECONDARY
SAM.FLAG_QCFAIL
SAM.FLAG_DUP
SAM.FLAG_SUPPLEMENTARY
BAM
BAM.Reader
BAM.header
BAM.Writer
BAM.Record
BAM.flag
BAM.ismapped
BAM.isprimary
BAM.ispositivestrand
BAM.refid
BAM.refname
BAM.position
BAM.rightposition
BAM.isnextmapped
BAM.nextrefid
BAM.nextrefname
BAM.nextposition
BAM.mappingquality
BAM.cigar
BAM.cigar_rle
BAM.alignment
BAM.alignlength
BAM.tempname
BAM.templength
BAM.sequence
BAM.seqlength
BAM.quality
BAM.auxdata
BAM.BAI