Records
FASTX files are considered a sequence of Record
s, FASTA.Record
for FASTA files and FASTQ.Record
for FASTQ. For convenience, FASTARecord
and FASTQRecord
are aliases of FASTA.Record
and FASTQ.Record
.
A Record
object represent the text of the FASTX record as it is, e.g the following FASTA record:
>some header here
TAGATGAA
AA
Is stored in a FASTA.Record
object roughly as its constituent bytes, plus some metadata. There is no notion in the record object of being a DNA or RNA sequence - it's simply an array of bytes.
Records can be constructed from raw parts (i.e. description and sequence and, for FASTQ, quality), where
description::AbstractString
sequence::Union{AbstractString, BioSequence}
quality::Union{AbstractString, Vector{<:Number}}
Alternatively, they can be parsed directly from a string or an AbstractVector{UInt8}
.
julia> record = parse(FASTARecord, ">abc\nAGCC\nCCGA");
julia> record2 = FASTARecord("abc", "AGCCCCGA");
julia> record == record2
true
Records can be queried for their information, namely identifier, description and sequence (and quality, for FASTQ). By default, this returns an AbstractString
view into the Record
's data:
julia> record = parse(FASTARecord, ">ident desc\nUGU\nGA");
julia> (identifier(record), description(record), sequence(record))
("ident", "ident desc", "UGUGA")
However, you can ask for getting the sequences as a String
or any subtype of BioSequence
:
julia> record = parse(FASTARecord, ">abc\nUGC\nCCA");
julia> using BioSequences # LongRNA defined in BioSequences.jl
julia> sequence(LongRNA{2}, record)
6nt RNA Sequence:
UGCCCA
julia> sequence(String, record)
"UGCCCA"
The number of bytes in the sequence of a Record
can be queried using seqsize
:
julia> record = parse(FASTARecord, ">abc\nUGC\nCCA");
julia> seqsize(record)
6
Reference:
FASTX.identifier
— Functionidentifier(record::Record)::AbstractString
Get the sequence identifier of record
. The identifier is the description before any whitespace. If the identifier is missing, return an empty string. Returns an AbstractString
view into the record. If the record is overwritten, the string data will be corrupted.
See also: description
, sequence
Examples
julia> record = parse(FASTA.Record, ">ident_here some descr \nTAGA");
julia> identifier(record)
"ident_here"
FASTX.description
— Functiondescription(record::Record)::AbstractString
Get the description of record
. The description is the entire header line, minus the leading >
or @
symbols for FASTA/FASTQ records, respectively, including trailing whitespace. Returns an AbstractString
view into the record. If the record is overwritten, the string data will be corrupted.
See also: identifier
, sequence
Examples
julia> record = parse(FASTA.Record, ">ident_here some descr \nTAGA");
julia> description(record)
"ident_here some descr "
FASTX.sequence
— Functionsequence([::Type{S}], record::Record, [part::UnitRange{Int}])::S
Get the sequence of record
.
S
can be either a subtype of BioSequences.BioSequence
, AbstractString
or String
. If elided, S
defaults to an AbstractString
subtype. If part
argument is given, it returns the specified part of the sequence.
See also: identifier
, description
Examples
julia> record = parse(FASTQ.Record, "@read1\nTAGA\n+\n;;]]");
julia> sequence(record)
"TAGA"
julia> sequence(LongDNA{2}, record)
4nt DNA Sequence:
TAGA
FASTX.seqsize
— Functionseqsize(::Record)::Int
Get the number of bytes in the sequence of a Record
. Note that in the presence of non-ASCII characters, this may differ from length(sequence(record))
.
See also: sequence
Examples
julia> seqsize(parse(FASTA.Record, ">hdr\nKRRLPW\nYHS"))
9
julia> seqsize(parse(FASTA.Record, ">hdr\nαβγδϵ"))
10