Biological symbols

The BioSequences module reexports the biological symbol (character) types that are provided by BioSymbols.jl:

TypeMeaning
DNADNA nucleotide
RNARNA nucleotide
AminoAcidAmino acid

These symbols are elements of biological sequence types, just as characters are elements of strings.

DNA and RNA nucleotides

Set of nucleotide symbols in BioSequences covers IUPAC nucleotide base plus a gap symbol:

SymbolConstantMeaning
'A'DNA_A / RNA_AA; Adenine
'C'DNA_C / RNA_CC; Cytosine
'G'DNA_G / RNA_GG; Guanine
'T'DNA_TT; Thymine (DNA only)
'U'RNA_UU; Uracil (RNA only)
'M'DNA_M / RNA_MA or C
'R'DNA_R / RNA_RA or G
'W'DNA_W / RNA_WA or T/U
'S'DNA_S / RNA_SC or G
'Y'DNA_Y / RNA_YC or T/U
'K'DNA_K / RNA_KG or T/U
'V'DNA_V / RNA_VA or C or G; not T/U
'H'DNA_H / RNA_HA or C or T; not G
'D'DNA_D / RNA_DA or G or T/U; not C
'B'DNA_B / RNA_BC or G or T/U; not A
'N'DNA_N / RNA_NA or C or G or T/U
'-'DNA_Gap / RNA_GapGap (none of the above)

https://www.bioinformatics.org/sms/iupac.html

Symbols are accessible as constants with DNA_ or RNA_ prefix:

julia> DNA_A
DNA_A

julia> DNA_T
DNA_T

julia> RNA_U
RNA_U

julia> DNA_Gap
DNA_Gap

julia> typeof(DNA_A)
DNA

julia> typeof(RNA_A)
RNA

Symbols can be constructed by converting regular characters:

julia> convert(DNA, 'C')
DNA_C

julia> convert(DNA, 'C') === DNA_C
true

Amino acids

Set of amino acid symbols also covers IUPAC amino acid symbols plus a gap symbol:

SymbolConstantMeaning
'A'AA_AAlanine
'R'AA_RArginine
'N'AA_NAsparagine
'D'AA_DAspartic acid (Aspartate)
'C'AA_CCysteine
'Q'AA_QGlutamine
'E'AA_EGlutamic acid (Glutamate)
'G'AA_GGlycine
'H'AA_HHistidine
'I'AA_IIsoleucine
'L'AA_LLeucine
'K'AA_KLysine
'M'AA_MMethionine
'F'AA_FPhenylalanine
'P'AA_PProline
'S'AA_SSerine
'T'AA_TThreonine
'W'AA_WTryptophan
'Y'AA_YTyrosine
'V'AA_VValine
'O'AA_OPyrrolysine
'U'AA_USelenocysteine
'B'AA_BAspartic acid or Asparagine
'J'AA_JLeucine or Isoleucine
'Z'AA_ZGlutamine or Glutamic acid
'X'AA_XAny amino acid
'*'AA_TermTermination codon
'-'AA_GapGap (none of the above)

https://www.bioinformatics.org/sms/iupac.html

Symbols are accessible as constants with AA_ prefix:

julia> AA_A
AA_A

julia> AA_Q
AA_Q

julia> AA_Term
AA_Term

julia> typeof(AA_A)
AminoAcid

Symbols can be constructed by converting regular characters:

julia> convert(AminoAcid, 'A')
AA_A

julia> convert(AminoAcid, 'P') === AA_P
true

Other functions

BioSymbols.alphabetFunction
alphabet(DNA)

Get all symbols of DNA in sorted order.

Examples

julia> alphabet(DNA)
(DNA_Gap, DNA_A, DNA_C, DNA_M, DNA_G, DNA_R, DNA_S, DNA_V, DNA_T, DNA_W, DNA_Y, DNA_H, DNA_K, DNA_D, DNA_B, DNA_N)

julia> issorted(alphabet(DNA))
true
source
alphabet(RNA)

Get all symbols of RNA in sorted order.

Examples

julia> alphabet(RNA)
(RNA_Gap, RNA_A, RNA_C, RNA_M, RNA_G, RNA_R, RNA_S, RNA_V, RNA_U, RNA_W, RNA_Y, RNA_H, RNA_K, RNA_D, RNA_B, RNA_N)

julia> issorted(alphabet(RNA))
true
source
alphabet(AminoAcid)

Get all symbols of AminoAcid in sorted order.

Examples

julia> alphabet(AminoAcid)
(AA_A, AA_R, AA_N, AA_D, AA_C, AA_Q, AA_E, AA_G, AA_H, AA_I, AA_L, AA_K, AA_M, AA_F, AA_P, AA_S, AA_T, AA_W, AA_Y, AA_V, AA_O, AA_U, AA_B, AA_J, AA_Z, AA_X, AA_Term, AA_Gap)

julia> issorted(alphabet(AminoAcid))
true
source
BioSymbols.gapFunction
gap(::Type{T})::T

Return the gap (indel) representation of T. By default, gap is defined for DNA, RNA, AminoAcid and Char.

Examples

julia> gap(RNA)
RNA_Gap

julia> gap(Char)
'-': ASCII/Unicode U+002D (category Pd: Punctuation, dash)
source
BioSymbols.iscompatibleFunction
iscompatible(x::S, y::S) where S <: BioSymbol

Test if x and y are compatible with each other.

Examples

julia> iscompatible(AA_A, AA_R)
false

julia> iscompatible(AA_A, AA_X)
true

julia> iscompatible(DNA_A, DNA_A)
true

julia> iscompatible(DNA_C, DNA_N)  # DNA_N can be DNA_C
true

julia> iscompatible(DNA_C, DNA_R)  # DNA_R (A or G) cannot be DNA_C
false
source
BioSymbols.isambiguousFunction
isambiguous(nt::NucleicAcid)

Test if nt is an ambiguous nucleotide.

source
isambiguous(aa::AminoAcid)

Test if aa is an ambiguous amino acid.

source