Generating random sequences
Long sequences
You can generate random long sequences using the randdna
function and the Sampler
's implemented in BioSequences:
BioSequences.randseq
— Functionrandseq([rng::AbstractRNG], A::Alphabet, len::Integer)
Generate a LongSequence{A} of length len
from the specified alphabet, drawn from the default distribution. User-defined alphabets should implement this method to implement random LongSequence generation.
For RNA and DNA alphabets, the default distribution is uniform across A, C, G, and T/U. For AminoAcidAlphabet, it is uniform across the 20 standard amino acids. For a user-defined alphabet A, default is uniform across all elements of symbols(A)
.
Example:
julia> seq = randseq(AminoAcidAlphabet(), 50)
50aa Amino Acid Sequence:
VFMHSIRMIRLMVHRSWKMHSARHVNFIRCQDKKWKSADGIYTDICKYSM
randseq([rng::AbstractRNG], A::Alphabet, sp::Sampler, len::Integer)
Generate a LongSequence{A} of length len
with elements drawn from the given sampler.
Example:
# Generate 1000-length RNA with 4% chance of N, 24% for A, C, G, or U
julia> sp = SamplerWeighted(rna"ACGUN", fill(0.24, 4))
julia> seq = randseq(RNAAlphabet{4}(), sp, 50)
50nt RNA Sequence:
CUNGGGCCCGGGNAAACGUGGUACACCCUGUUAAUAUCAACNNGCGCUNU
BioSequences.randdnaseq
— Functionranddnaseq([rng::AbstractRNG], len::Integer)
Generate a random LongSequence{DNAAlphabet{4}} sequence of length len
, with bases sampled uniformly from [A, C, G, T]
BioSequences.randrnaseq
— Functionrandrnaseq([rng::AbstractRNG], len::Integer)
Generate a random LongSequence{RNAAlphabet{4}} sequence of length len
, with bases sampled uniformly from [A, C, G, U]
BioSequences.randaaseq
— Functionrandaaseq([rng::AbstractRNG], len::Integer)
Generate a random LongSequence{AminoAcidAlphabet} sequence of length len
, with amino acids sampled uniformly from the 20 standard amino acids.
BioSequences.SamplerUniform
— TypeSamplerUniform{T}
Uniform sampler of type T. Instantiate with a collection of eltype T containing the elements to sample.
Examples
julia> sp = SamplerUniform(rna"ACGU");
BioSequences.SamplerWeighted
— TypeSamplerWeighted{T}
Weighted sampler of type T. Instantiate with a collection of eltype T containing the elements to sample, and an orderen collection of probabilities to sample each element except the last. The last probability is the remaining probability up to 1.
Examples
julia> sp = SamplerWeighted(rna"ACGUN", fill(0.2475, 4));