Sequences

Using Vector

A quick way to create a DNA/RNA sequence is storing symbols in a vector.

julia> seq = [DNA_A, DNA_C, DNA_G, DNA_T]
4-element Array{DNA,1}:
 DNA_A
 DNA_C
 DNA_G
 DNA_T

julia> [convert(DNA, x) for x in "ACGT"]  # from a string
4-element Array{DNA,1}:
 DNA_A
 DNA_C
 DNA_G
 DNA_T

Using Tuple

Julia offers a tuple type to represent multiple values in a value. It is similar to Vector but is significantly different in some points. First, Tuple is immutable while Vector is mutable. So you cannot update elements in a tuple once created. Second, a tuple type is parameterized by its length. That means it is inefficient to represent variable-length sequences in tuple due to type instability problem.

julia> (RNA_A, RNA_U, RNA_C)  # RNA triplet (or codon)
(RNA_A, RNA_U, RNA_C)

Using the BioSequences package

Using Vector or Tuple is simple, however, BioSymbols does not offer useful operations for these representations. So you need to use built-in operations of Julia or other packages. Moreover, these representations are not necessarily efficient. For example, DNA is an 8-bit primitive but it only uses 4 bits, which means 50% of a Vector{DNA}'s space is not used at all.

For the purpose of representing sequences as efficient as possible BioJulia has developed BioSequences package. The BioSequence type is able to represent a DNA/RNA sequence in 2 or 4 bits per symbol. It also offers many efficient algorithms.