Generating random sequences

Long sequences

You can generate random long sequences using the randdna function and the Sampler's implemented in BioSequences:

BioSequences.randseqFunction
randseq([rng::AbstractRNG], A::Alphabet, len::Integer)

Generate a LongSequence{A} of length len from the specified alphabet, drawn from the default distribution. User-defined alphabets should implement this method to implement random LongSequence generation.

For RNA and DNA alphabets, the default distribution is uniform across A, C, G, and T/U. For AminoAcidAlphabet, it is uniform across the 20 standard amino acids. For a user-defined alphabet A, default is uniform across all elements of symbols(A).

Example:

julia> seq = randseq(AminoAcidAlphabet(), 50)
50aa Amino Acid Sequence:
VFMHSIRMIRLMVHRSWKMHSARHVNFIRCQDKKWKSADGIYTDICKYSM
source
randseq([rng::AbstractRNG], A::Alphabet, sp::Sampler, len::Integer)

Generate a LongSequence{A} of length len with elements drawn from the given sampler.

Example:

# Generate 1000-length RNA with 4% chance of N, 24% for A, C, G, or U
julia> sp = SamplerWeighted(rna"ACGUN", fill(0.24, 4))
julia> seq = randseq(RNAAlphabet{4}(), sp, 50)
50nt RNA Sequence:
CUNGGGCCCGGGNAAACGUGGUACACCCUGUUAAUAUCAACNNGCGCUNU
source
BioSequences.randdnaseqFunction
randdnaseq([rng::AbstractRNG], len::Integer)

Generate a random LongSequence{DNAAlphabet{4}} sequence of length len, with bases sampled uniformly from [A, C, G, T]

source
BioSequences.randrnaseqFunction
randrnaseq([rng::AbstractRNG], len::Integer)

Generate a random LongSequence{RNAAlphabet{4}} sequence of length len, with bases sampled uniformly from [A, C, G, U]

source
BioSequences.randaaseqFunction
randaaseq([rng::AbstractRNG], len::Integer)

Generate a random LongSequence{AminoAcidAlphabet} sequence of length len, with amino acids sampled uniformly from the 20 standard amino acids.

source
BioSequences.SamplerUniformType
SamplerUniform{T}

Uniform sampler of type T. Instantiate with a collection of eltype T containing the elements to sample.

Examples

julia> sp = SamplerUniform(rna"ACGU");
source
BioSequences.SamplerWeightedType
SamplerWeighted{T}

Weighted sampler of type T. Instantiate with a collection of eltype T containing the elements to sample, and an orderen collection of probabilities to sample each element except the last. The last probability is the remaining probability up to 1.

Examples

julia> sp = SamplerWeighted(rna"ACGUN", fill(0.2475, 4));
source