Biological symbols
The BioSequences
module reexports the biological symbol (character) types that are provided by BioSymbols.jl:
Type | Meaning |
---|---|
DNA | DNA nucleotide |
RNA | RNA nucleotide |
AminoAcid | Amino acid |
These symbols are elements of biological sequence types, just as characters are elements of strings.
DNA and RNA nucleotides
Set of nucleotide symbols in BioSequences covers IUPAC nucleotide base plus a gap symbol:
Symbol | Constant | Meaning |
---|---|---|
'A' | DNA_A / RNA_A | A; Adenine |
'C' | DNA_C / RNA_C | C; Cytosine |
'G' | DNA_G / RNA_G | G; Guanine |
'T' | DNA_T | T; Thymine (DNA only) |
'U' | RNA_U | U; Uracil (RNA only) |
'M' | DNA_M / RNA_M | A or C |
'R' | DNA_R / RNA_R | A or G |
'W' | DNA_W / RNA_W | A or T/U |
'S' | DNA_S / RNA_S | C or G |
'Y' | DNA_Y / RNA_Y | C or T/U |
'K' | DNA_K / RNA_K | G or T/U |
'V' | DNA_V / RNA_V | A or C or G; not T/U |
'H' | DNA_H / RNA_H | A or C or T; not G |
'D' | DNA_D / RNA_D | A or G or T/U; not C |
'B' | DNA_B / RNA_B | C or G or T/U; not A |
'N' | DNA_N / RNA_N | A or C or G or T/U |
'-' | DNA_Gap / RNA_Gap | Gap (none of the above) |
https://www.bioinformatics.org/sms/iupac.html
Symbols are accessible as constants with DNA_
or RNA_
prefix:
julia> DNA_A
DNA_A
julia> DNA_T
DNA_T
julia> RNA_U
RNA_U
julia> DNA_Gap
DNA_Gap
julia> typeof(DNA_A)
DNA
julia> typeof(RNA_A)
RNA
Symbols can be constructed by converting regular characters:
julia> convert(DNA, 'C')
DNA_C
julia> convert(DNA, 'C') === DNA_C
true
Amino acids
Set of amino acid symbols also covers IUPAC amino acid symbols plus a gap symbol:
Symbol | Constant | Meaning |
---|---|---|
'A' | AA_A | Alanine |
'R' | AA_R | Arginine |
'N' | AA_N | Asparagine |
'D' | AA_D | Aspartic acid (Aspartate) |
'C' | AA_C | Cysteine |
'Q' | AA_Q | Glutamine |
'E' | AA_E | Glutamic acid (Glutamate) |
'G' | AA_G | Glycine |
'H' | AA_H | Histidine |
'I' | AA_I | Isoleucine |
'L' | AA_L | Leucine |
'K' | AA_K | Lysine |
'M' | AA_M | Methionine |
'F' | AA_F | Phenylalanine |
'P' | AA_P | Proline |
'S' | AA_S | Serine |
'T' | AA_T | Threonine |
'W' | AA_W | Tryptophan |
'Y' | AA_Y | Tyrosine |
'V' | AA_V | Valine |
'O' | AA_O | Pyrrolysine |
'U' | AA_U | Selenocysteine |
'B' | AA_B | Aspartic acid or Asparagine |
'J' | AA_J | Leucine or Isoleucine |
'Z' | AA_Z | Glutamine or Glutamic acid |
'X' | AA_X | Any amino acid |
'*' | AA_Term | Termination codon |
'-' | AA_Gap | Gap (none of the above) |
https://www.bioinformatics.org/sms/iupac.html
Symbols are accessible as constants with AA_
prefix:
julia> AA_A
AA_A
julia> AA_Q
AA_Q
julia> AA_Term
AA_Term
julia> typeof(AA_A)
AminoAcid
Symbols can be constructed by converting regular characters:
julia> convert(AminoAcid, 'A')
AA_A
julia> convert(AminoAcid, 'P') === AA_P
true
Other functions
BioSymbols.alphabet
— Functionalphabet(DNA)
Get all symbols of DNA
in sorted order.
Examples
julia> alphabet(DNA)
(DNA_Gap, DNA_A, DNA_C, DNA_M, DNA_G, DNA_R, DNA_S, DNA_V, DNA_T, DNA_W, DNA_Y, DNA_H, DNA_K, DNA_D, DNA_B, DNA_N)
julia> issorted(alphabet(DNA))
true
alphabet(RNA)
Get all symbols of RNA
in sorted order.
Examples
julia> alphabet(RNA)
(RNA_Gap, RNA_A, RNA_C, RNA_M, RNA_G, RNA_R, RNA_S, RNA_V, RNA_U, RNA_W, RNA_Y, RNA_H, RNA_K, RNA_D, RNA_B, RNA_N)
julia> issorted(alphabet(RNA))
true
alphabet(AminoAcid)
Get all symbols of AminoAcid
in sorted order.
Examples
julia> alphabet(AminoAcid)
(AA_A, AA_R, AA_N, AA_D, AA_C, AA_Q, AA_E, AA_G, AA_H, AA_I, AA_L, AA_K, AA_M, AA_F, AA_P, AA_S, AA_T, AA_W, AA_Y, AA_V, AA_O, AA_U, AA_B, AA_J, AA_Z, AA_X, AA_Term, AA_Gap)
julia> issorted(alphabet(AminoAcid))
true
BioSymbols.gap
— Functiongap(::Type{T})::T
Return the gap (indel) representation of T
. By default, gap
is defined for DNA
, RNA
, AminoAcid
and Char
.
Examples
julia> gap(RNA)
RNA_Gap
julia> gap(Char)
'-': ASCII/Unicode U+002D (category Pd: Punctuation, dash)
BioSymbols.iscompatible
— Functioniscompatible(x::S, y::S) where S <: BioSymbol
Test if x
and y
are compatible with each other.
Examples
julia> iscompatible(AA_A, AA_R)
false
julia> iscompatible(AA_A, AA_X)
true
julia> iscompatible(DNA_A, DNA_A)
true
julia> iscompatible(DNA_C, DNA_N) # DNA_N can be DNA_C
true
julia> iscompatible(DNA_C, DNA_R) # DNA_R (A or G) cannot be DNA_C
false
BioSymbols.isambiguous
— Functionisambiguous(nt::NucleicAcid)
Test if nt
is an ambiguous nucleotide.
isambiguous(aa::AminoAcid)
Test if aa
is an ambiguous amino acid.