IO - FASTQ formatted files
FASTQ is a text-based file format for representing DNA sequences along with qualities for each base. A FASTQ file stores a list of sequence records in the following format:
@{name} {description}?
{sequence}
+
{qualities}Here is an example of one record from a FASTQ file:
@FSRRS4401BE7HA
tcagTTAAGATGGGAT
+
###EEEEEEEEE##E#Readers and Writers
The reader and writer for FASTQ formatted files, are found within the BioSequences.FASTQ module.
BioSequences.FASTQ.Reader — Type.FASTQ.Reader(input::IO; fill_ambiguous=nothing)Create a data reader of the FASTQ file format.
Arguments
input: data sourcefill_ambiguous=nothing: fill ambiguous symbols with the given symbol
BioSequences.FASTQ.Writer — Type.FASTQ.Writer(output::IO; quality_header=false)Create a data writer of the FASTQ file format.
Arguments
output: data sinkquality_header=false: output the title line at the third line just after '+'
They can be created with IOStreams:
r = FASTQ.Reader(open("MyInput.fastq", "r"))
w = FASTQ.Writer(open("MyFile.fastq", "w"))Note that FASTQ.Reader does not support line-wraps within sequence and quality. Usually sequence records will be read sequentially from a file by iteration.
using BioSequences
reader = FASTQ.Reader(open("hg38.fastq", "r"))
for record in reader
# Do something
end
close(reader)Reading in a record from a FASTQ formatted file will give you a variable of type FASTQ.Record.
BioSequences.FASTQ.Record — Type.FASTQ.Record()Create an unfilled FASTQ record.
FASTQ.Record(data::Vector{UInt8})Create a FASTQ record object from data.
This function verifies and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.
FASTQ.Record(str::AbstractString)Create a FASTQ record object from str.
This function verifies and indexes fields for accessors.
FASTQ.Record(identifier, sequence, quality; offset=33)Create a FASTQ record from identifier, sequence and quality.
FASTQ.Record(identifier, description, sequence, quality; offset=33)Create a FASTQ record from identifier, description, sequence and quality.
Various getters and setters are available for FASTQ.Records:
BioSequences.FASTQ.hasidentifier — Function.hasidentifier(record::Record)Checks whether or not the record has an identifier.
BioSequences.FASTQ.identifier — Function.identifier(record::Record)::StringGet the sequence identifier of record.
BioSequences.FASTQ.hasdescription — Function.hasdescription(record::Record)Checks whether or not the record has a description.
BioSequences.FASTQ.description — Function.description(record::Record)::StringGet the description of record.
BioSequences.FASTQ.hassequence — Function.hassequence(record::Record)Checks whether or not a sequence record contains a sequence.
BioSequences.FASTQ.sequence — Method.sequence(record::Record, [part::UnitRange{Int}])::BioSequences.DNASequenceGet the sequence of record.
BioSequences.FASTQ.hasquality — Function.hasquality(record::Record)Check whether the given FASTQ record has a quality string.
BioSequences.FASTQ.quality — Function.quality(record::Record, [offset::Integer=33, [part::UnitRange]])::Vector{UInt8}Get the base quality of record.
quality(record::Record, encoding_name::Symbol, [part::UnitRange])::Vector{UInt8}Get the base quality of record by decoding with encoding_name.
The encoding_name can be either :sanger, :solexa, :illumina13, :illumina15, or :illumina18.
To write a BioSequence to FASTQ file, you first have to create a FASTQ.Record:
BioSequences.FASTQ.Record — Method.FASTQ.Record(identifier, description, sequence, quality; offset=33)Create a FASTQ record from identifier, description, sequence and quality.
As always with julia IO types, remember to close your file readers and writer after you are finished.