Records
FASTX files are considered a sequence of Records, FASTA.Record for FASTA files and FASTQ.Record for FASTQ. For convenience, FASTARecord and FASTQRecord are aliases of FASTA.Record and FASTQ.Record.
A Record object represent the text of the FASTX record as it is, e.g the following FASTA record:
>some header here
TAGATGAA
AAIs stored in a FASTA.Record object roughly as its constituent bytes, plus some metadata. There is no notion in the record object of being a DNA or RNA sequence - it's simply an array of bytes.
Records can be constructed from raw parts (i.e. description and sequence and, for FASTQ, quality), where
description::AbstractStringsequence::Union{AbstractString, BioSequence}quality::Union{AbstractString, Vector{<:Number}}
Alternatively, they can be parsed directly from a string or an AbstractVector{UInt8}.
julia> record = parse(FASTARecord, ">abc\nAGCC\nCCGA");
julia> record2 = FASTARecord("abc", "AGCCCCGA");
julia> record == record2
trueRecords can be queried for their information, namely identifier, description and sequence (and quality, for FASTQ). By default, this returns an AbstractString view into the Record's data:
julia> record = parse(FASTARecord, ">ident desc\nUGU\nGA");
julia> (identifier(record), description(record), sequence(record))
("ident", "ident desc", "UGUGA")However, you can ask for getting the sequences as a String or any subtype of BioSequence:
julia> record = parse(FASTARecord, ">abc\nUGC\nCCA");
julia> using BioSequences # LongRNA defined in BioSequences.jl
julia> sequence(LongRNA{2}, record)
6nt RNA Sequence:
UGCCCA
julia> sequence(String, record)
"UGCCCA"The number of bytes in the sequence of a Record can be queried using seqsize:
julia> record = parse(FASTARecord, ">abc\nUGC\nCCA");
julia> seqsize(record)
6Reference:
FASTX.identifier — Functionidentifier(record::Record)::AbstractStringGet the sequence identifier of record. The identifier is the description before any whitespace. If the identifier is missing, return an empty string. Returns an AbstractString view into the record. If the record is overwritten, the string data will be corrupted.
See also: description, sequence
Examples
julia> record = parse(FASTA.Record, ">ident_here some descr \nTAGA");
julia> identifier(record)
"ident_here"FASTX.description — Functiondescription(record::Record)::AbstractStringGet the description of record. The description is the entire header line, minus the leading > or @ symbols for FASTA/FASTQ records, respectively, including trailing whitespace. Returns an AbstractString view into the record. If the record is overwritten, the string data will be corrupted.
See also: identifier, sequence
Examples
julia> record = parse(FASTA.Record, ">ident_here some descr \nTAGA");
julia> description(record)
"ident_here some descr "FASTX.sequence — Functionsequence([::Type{S}], record::Record, [part::UnitRange{Int}])::SGet the sequence of record.
S can be either a subtype of BioSequences.BioSequence, AbstractString or String. If elided, S defaults to an AbstractString subtype. If part argument is given, it returns the specified part of the sequence.
See also: identifier, description
Examples
julia> record = parse(FASTQ.Record, "@read1\nTAGA\n+\n;;]]");
julia> sequence(record)
"TAGA"
julia> sequence(LongDNA{2}, record)
4nt DNA Sequence:
TAGAFASTX.seqsize — Functionseqsize(::Record)::IntGet the number of bytes in the sequence of a Record. Note that in the presence of non-ASCII characters, this may differ from length(sequence(record)).
See also: sequence
Examples
julia> seqsize(parse(FASTA.Record, ">hdr\nKRRLPW\nYHS"))
9
julia> seqsize(parse(FASTA.Record, ">hdr\nαβγδϵ"))
10