IO - FASTQ formatted files
FASTQ is a text-based file format for representing DNA sequences along with qualities for each base. A FASTQ file stores a list of sequence records in the following format:
@{name} {description}?
{sequence}
+
{qualities}
Here is an example of one record from a FASTQ file:
@FSRRS4401BE7HA
tcagTTAAGATGGGAT
+
###EEEEEEEEE##E#
Readers and Writers
The reader and writer for FASTQ formatted files, are found within the BioSequences.FASTQ
module.
BioSequences.FASTQ.Reader
— Type.FASTQ.Reader(input::IO; fill_ambiguous=nothing)
Create a data reader of the FASTQ file format.
Arguments
input
: data sourcefill_ambiguous=nothing
: fill ambiguous symbols with the given symbol
BioSequences.FASTQ.Writer
— Type.FASTQ.Writer(output::IO; quality_header=false)
Create a data writer of the FASTQ file format.
Arguments
output
: data sinkquality_header=false
: output the title line at the third line just after '+'
They can be created with IOStreams:
r = FASTQ.Reader(open("MyInput.fastq", "r"))
w = FASTQ.Writer(open("MyFile.fastq", "w"))
Note that FASTQ.Reader
does not support line-wraps within sequence and quality. Usually sequence records will be read sequentially from a file by iteration.
using BioSequences
reader = FASTQ.Reader(open("hg38.fastq", "r"))
for record in reader
# Do something
end
close(reader)
Reading in a record from a FASTQ formatted file will give you a variable of type FASTQ.Record
.
BioSequences.FASTQ.Record
— Type.FASTQ.Record()
Create an unfilled FASTQ record.
FASTQ.Record(data::Vector{UInt8})
Create a FASTQ record object from data
.
This function verifies and indexes fields for accessors. Note that the ownership of data
is transferred to a new record object.
FASTQ.Record(str::AbstractString)
Create a FASTQ record object from str
.
This function verifies and indexes fields for accessors.
FASTQ.Record(identifier, sequence, quality; offset=33)
Create a FASTQ record from identifier
, sequence
and quality
.
FASTQ.Record(identifier, description, sequence, quality; offset=33)
Create a FASTQ record from identifier
, description
, sequence
and quality
.
Various getters and setters are available for FASTQ.Record
s:
BioSequences.FASTQ.hasidentifier
— Function.hasidentifier(record::Record)
Checks whether or not the record
has an identifier.
BioSequences.FASTQ.identifier
— Function.identifier(record::Record)::String
Get the sequence identifier of record
.
BioSequences.FASTQ.hasdescription
— Function.hasdescription(record::Record)
Checks whether or not the record
has a description.
BioSequences.FASTQ.description
— Function.description(record::Record)::String
Get the description of record
.
BioSequences.FASTQ.hassequence
— Function.hassequence(record::Record)
Checks whether or not a sequence record contains a sequence.
BioSequences.FASTQ.sequence
— Method.sequence(record::Record, [part::UnitRange{Int}])::BioSequences.DNASequence
Get the sequence of record
.
BioSequences.FASTQ.hasquality
— Function.hasquality(record::Record)
Check whether the given FASTQ record
has a quality string.
BioSequences.FASTQ.quality
— Function.quality(record::Record, [offset::Integer=33, [part::UnitRange]])::Vector{UInt8}
Get the base quality of record
.
quality(record::Record, encoding_name::Symbol, [part::UnitRange])::Vector{UInt8}
Get the base quality of record
by decoding with encoding_name
.
The encoding_name
can be either :sanger
, :solexa
, :illumina13
, :illumina15
, or :illumina18
.
To write a BioSequence
to FASTQ file, you first have to create a FASTQ.Record
:
BioSequences.FASTQ.Record
— Method.FASTQ.Record(identifier, description, sequence, quality; offset=33)
Create a FASTQ record from identifier
, description
, sequence
and quality
.
As always with julia IO types, remember to close your file readers and writer after you are finished.