IO - FASTA formatted files
FASTA is a text-based file format for representing biological sequences. A FASTA file stores a list of sequence records with name, description, and sequence.
The template of a sequence record is:
>{name} {description}?
{sequence}Here is an example of a chromosomal sequence:
>chrI chromosome 1
CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACC
CACACACACACATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGReaders and Writers
The reader and writer for FASTA formatted files, are found within the BioSequences.FASTA module.
BioSequences.FASTA.Reader — Type.FASTA.Reader(input::IO; index=nothing)Create a data reader of the FASTA file format.
Arguments
input: data sourceindex=nothing: filepath to a random access index (currently fai is supported)
BioSequences.FASTA.Writer — Type.FASTA.Writer(output::IO; width=70)Create a data writer of the FASTA file format.
Arguments
output: data sinkwidth=70: wrapping width of sequence characters
They can be created with IOStreams:
r = FASTA.Reader(open("MyInput.fasta", "r"))
w = FASTA.Writer(open("MyFile.fasta", "w"))Usually sequence records will be read sequentially from a file by iteration.
using BioSequences
reader = FASTA.Reader(open("hg38.fa", "r"))
for record in reader
# Do something
end
close(reader)But if the FASTA file has an auxiliary index file formatted in fai, the reader supports random access to FASTA records, which would be useful when accessing specific parts of a huge genome sequence:
reader = open(FASTAReader, "sacCer.fa", index="sacCer.fa.fai")
chrIV = reader["chrIV"] # directly read sequences called chrIV.Reading in a sequence from a FASTA formatted file will give you a variable of type FASTA.Record.
BioSequences.FASTA.Record — Type.FASTA.Record()Create an unfilled FASTA record.
FASTA.Record(data::Vector{UInt8})Create a FASTA record object from data.
This function verifies and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.
FASTA.Record(str::AbstractString)Create a FASTA record object from str.
This function verifies and indexes fields for accessors.
FASTA.Record(identifier, sequence)Create a FASTA record object from identifier and sequence.
FASTA.Record(identifier, description, sequence)Create a FASTA record object from identifier, description and sequence.
Various getters and setters are available for FASTA.Records:
BioSequences.FASTA.hasidentifier — Function.hasidentifier(record::Record)Checks whether or not the record has an identifier.
BioSequences.FASTA.identifier — Function.identifier(record::Record)::StringGet the sequence identifier of record.
BioSequences.FASTA.hasdescription — Function.hasdescription(record::Record)Checks whether or not the record has a description.
BioSequences.FASTA.description — Function.description(record::Record)::StringGet the description of record.
BioSequences.FASTA.hassequence — Function.hassequence(record::Record)Checks whether or not a sequence record contains a sequence.
BioSequences.FASTA.sequence — Method.sequence(record::Record, [part::UnitRange{Int}])Get the sequence of record.
This function infers the sequence type from the data. When it is wrong or unreliable, use sequence(::Type{S}, record::Record). If part argument is given, it returns the specified part of the sequence.
To write a BioSequence to FASTA file, you first have to create a FASTA.Record:
using BioSequences
x = dna"aaaaatttttcccccggggg"
rec = FASTA.Record("MySeq", x)
w = FASTA.Writer(open("MyFile.fasta", "w"))
write(w, rec)As always with julia IO types, remember to close your file readers and writer after you are finished.