FASTA formatted files
NB: First read the overview in the sidebar
FASTA is a text-based file format for representing biological sequences. A FASTA file stores a list of sequence records with name, description, and sequence.
The template of a sequence record is:
>{description}
{sequence}
Where the "identifier" is the first part of the description up to the first whitespace (or the entire description if there is no whitespace)
Here is an example of a chromosomal sequence:
>chrI chromosome 1
CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACC
CACACACACACATCCTAACACTACCCTAACACAGCCCTAATCTA
Here:
- The
identifier
is"chrI"
- The
description
is"chrI chromosome 1"
, containing the identifier - The sequence is the DNA sequence
"CCACA..."
The FASTARecord
FASTA records are, by design, very lax in what they can contain. They can contain almost arbitrary byte sequences, including invalid unicode, and trailing whitespace on their sequence lines, which will be interpreted as part of the sequence. If you want to have more certainty about the format, you can either check the content of the sequences with a regex, or (preferably), convert them to the desired BioSequence
type.
FASTX.FASTA.Record
— TypeFASTA.Record
Mutable struct representing a FASTA record as parsed from a FASTA file. The content of the record can be queried with the following functions: identifier
, description
, sequence
.
FASTA records are un-typed, i.e. they are agnostic to what kind of data they contain.
See also: FASTA.Reader
, FASTA.Writer
Examples
julia> rec = parse(FASTARecord, ">some header\nTAqA\nCC");
julia> identifier(rec)
"some"
julia> description(rec)
"some header"
julia> sequence(rec)
"TAqACC"
julia> typeof(description(rec)) == typeof(sequence(rec)) <: AbstractString
true
FASTAReader
and FASTAWriter
FASTAWriter
can optionally be passed the keyword width
to control the line width. If this is zero or negative, it will write all record sequences on a single line. Else, it will wrap lines to the given maximal width.
Reference:
FASTX.FASTA
— ModuleFASTA
Module under FASTX with code related to FASTA files.
FASTX.FASTA.Reader
— TypeFASTA.Reader(input::IO; index=nothing, copy::Bool=true)
Create a buffered data reader of the FASTA file format. The reader is a BioGenerics.IO.AbstractReader
, a stateful iterator of FASTA.Record
. Readers take ownership of the underlying IO. Mutating or closing the underlying IO not using the reader is undefined behaviour. Closing the Reader also closes the underlying IO.
See more examples in the FASTX documentation.
See also: FASTA.Record
, FASTA.Writer
Arguments
input
: data sourceindex
: Optional random access index (currently fai is supported).index
can benothing
, aFASTA.Index
, or anIO
in which case an index will be parsed from the IO, orAbstractString
, in which case it will be treated as a path to a fai file.copy::Bool
: iterating returns fresh copies instead of the same Record. Set tofalse
for improved performance, but be wary that iterating mutates records.
Examples
julia> rdr = FASTAReader(IOBuffer(">header\nTAG\n>another\nAGA"));
julia> records = collect(rdr); close(rdr);
julia> foreach(println, map(identifier, records))
header
another
julia> foreach(println, map(sequence, records))
TAG
AGA
FASTX.FASTA.Writer
— TypeFASTA.Writer(output::IO; width=70)
Create a data writer of the FASTA file format. The writer is a BioGenerics.IO.AbstractWriter
. Writers take ownership of the underlying IO. Mutating or closing the underlying IO not using the writer is undefined behaviour. Closing the writer also closes the underlying IO.
See more examples in the FASTX documentation.
See also: FASTA.Record
, FASTA.Reader
Arguments
output
: Data sink to write towidth
: Wrapping width of sequence characters. If < 1, no wrapping.
Examples
julia> FASTA.Writer(open("some_file.fna", "w")) do writer
write(writer, record) # a FASTA.Record
end
FASTX.FASTA.validate_fasta
— Functionvalidate_fasta(io::IO) >: Nothing
Check if io
is a valid FASTA file. Return nothing
if it is, and an instance of another type if not.
Examples
julia> validate_fasta(IOBuffer(">a bc\nTAG\nTA")) === nothing
true
julia> validate_fasta(IOBuffer(">a bc\nT>G\nTA")) === nothing
false