Indexing & modifying sequences

Indexing

Most BioSequence concrete subtypes for the most part behave like other vector or string types. They can be indexed using integers or ranges:

For example, with LongSequences:

julia> seq = dna"ACGTTTANAGTNNAGTACC"
19nt DNA Sequence:
ACGTTTANAGTNNAGTACC

julia> seq[5]
DNA_T

julia> seq[6:end]
14nt DNA Sequence:
TANAGTNNAGTACC

The biological symbol at a given locus in a biological sequence can be set using setindex:

julia> seq = dna"ACGTTTANAGTNNAGTACC"
19nt DNA Sequence:
ACGTTTANAGTNNAGTACC

julia> seq[5] = DNA_A
DNA_A

Note

Some types such as Kmer can be indexed using integers but not using ranges.

For LongSequence types, indexing a sequence by range creates a subsequence of the original sequence. Unlike Arrays in the standard library, creating this subsequence is copy-free: the subsequence simply points to the original LongSequence data with its range. You may think that this is unsafe because modifying subsequences propagates to the original sequence, but this doesn't happen actually:

julia> seq = dna"AAAA"    # create a sequence
4nt DNA Sequence:
AAAA

julia> subseq = seq[1:2]  # create a subsequence from `seq`
2nt DNA Sequence:
AA

julia> subseq[2] = DNA_T  # modify the second element of it
DNA_T

julia> subseq             # the subsequence is modified
2nt DNA Sequence:
AT

julia> seq                # but the original sequence is not
4nt DNA Sequence:
AAAA

This is because modifying a sequence checks whether its underlying data are shared with other sequences under the hood. If and only if the data are shared, the subsequence creates a copy of itself. Any modifying operation does this check. This is called copy-on-write strategy and users don't need to care about it because it is transparent: If the user modifies a sequence with or subsequence, the job of managing and protecting the underlying data of sequences is handled for them.

Modifying sequences

In addition to setindex, many other modifying operations are possible for biological sequences such as push!, pop!, and insert!, which should be familiar to anyone used to editing arrays.

Base.push! — Function

push!(collection, items...) -> collection

Insert one or more items in collection. If collection is an ordered container, the items are inserted at the end (in the given order).

Examples

julia> push!([1, 2, 3], 4, 5, 6)
6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6

If collection is ordered, use append! to add all the elements of another collection to it. The result of the preceding example is equivalent to append!([1, 2, 3], [4, 5, 6]). For AbstractSet objects, union! can be used instead.

See sizehint! for notes about the performance model.

See also: pop!, popat!, delete!.

Examples

julia> A = [1, 2, 3, 4, 5, 6]
6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6

julia> popfirst!(A)
1

julia> A
5-element Vector{Int64}:
 2
 3
 4
 5
 6

Indexing & modifying sequences

Indexing

Modifying sequences

Additional transformations

Translation