IO - FASTQ formatted files
FASTQ is a text-based file format for representing DNA sequences along with qualities for each base. A FASTQ file stores a list of sequence records in the following format:
@{name} {description}?
{sequence}
+
{qualities}
Here is an example of one record from a FASTQ file:
@FSRRS4401BE7HA tcagTTAAGATGGGAT + ###EEEEEEEEE##E#
Readers and Writers
The reader and writer for FASTQ formatted files, are found within the BioSequences.FASTQ module.
#
BioSequences.FASTQ.Reader — Type.
FASTQ.Reader(input::IO; fill_ambiguous=nothing)
Create a data reader of the FASTQ file format.
Arguments
input: data sourcefill_ambiguous=nothing: fill ambiguous symbols with the given symbol
#
BioSequences.FASTQ.Writer — Type.
FASTQ.Writer(output::IO; quality_header=false)
Create a data writer of the FASTQ file format.
Arguments
output: data sinkquality_header=false: output the title line at the third line just after '+'
They can be created with IOStreams:
r = FASTQ.Reader(open("MyInput.fastq", "r")) w = FASTQ.Writer(open("MyFile.fastq", "w"))
Note that FASTQ.Reader does not support line-wraps within sequence and quality. Usually sequence records will be read sequentially from a file by iteration.
using BioSequences reader = FASTQ.Reader(open("hg38.fastq", "r")) for record in reader # Do something end close(reader)
Reading in a record from a FASTQ formatted file will give you a variable of type FASTQ.Record.
#
BioSequences.FASTQ.Record — Type.
FASTQ.Record()
Create an unfilled FASTQ record.
FASTQ.Record(data::Vector{UInt8})
Create a FASTQ record object from data.
This function verifies and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.
FASTQ.Record(str::AbstractString)
Create a FASTQ record object from str.
This function verifies and indexes fields for accessors.
FASTQ.Record(identifier, sequence, quality; offset=33)
Create a FASTQ record from identifier, sequence and quality.
FASTQ.Record(identifier, description, sequence, quality; offset=33)
Create a FASTQ record from identifier, description, sequence and quality.
Various getters and setters are available for FASTQ.Records:
#
BioSequences.FASTQ.hasidentifier — Function.
hasidentifier(record::Record)
Checks whether or not the record has an identifier.
#
BioSequences.FASTQ.identifier — Function.
identifier(record::Record)::String
Get the sequence identifier of record.
#
BioSequences.FASTQ.hasdescription — Function.
hasdescription(record::Record)
Checks whether or not the record has a description.
#
BioSequences.FASTQ.description — Function.
description(record::Record)::String
Get the description of record.
#
BioSequences.FASTQ.hassequence — Function.
hassequence(record::Record)
Checks whether or not a sequence record contains a sequence.
#
BioSequences.FASTQ.sequence — Method.
sequence(record::Record, [part::UnitRange{Int}])::BioSequences.DNASequence
Get the sequence of record.
#
BioSequences.FASTQ.hasquality — Function.
hasquality(record::Record)
Check whether the given FASTQ record has a quality string.
#
BioSequences.FASTQ.quality — Function.
quality(record::Record, [offset::Integer=33, [part::UnitRange]])::Vector{UInt8}
Get the base quality of record.
quality(record::Record, encoding_name::Symbol, [part::UnitRange])::Vector{UInt8}
Get the base quality of record by decoding with encoding_name.
The encoding_name can be either :sanger, :solexa, :illumina13, :illumina15, or :illumina18.
To write a BioSequence to FASTQ file, you first have to create a FASTQ.Record:
#
BioSequences.FASTQ.Record — Method.
FASTQ.Record(identifier, description, sequence, quality; offset=33)
Create a FASTQ record from identifier, description, sequence and quality.
As always with julia IO types, remember to close your file readers and writer after you are finished.