FASTQ formatted files

IO - FASTQ formatted files

FASTQ is a text-based file format for representing DNA sequences along with qualities for each base. A FASTQ file stores a list of sequence records in the following format:

@{name} {description}?
{sequence}
+
{qualities}

Here is an example of one record from a FASTQ file:

@FSRRS4401BE7HA
tcagTTAAGATGGGAT
+
###EEEEEEEEE##E#

Readers and Writers

The reader and writer for FASTQ formatted files, are found within the BioSequences.FASTQ module.

FASTQ.Reader(input::IO; fill_ambiguous=nothing)

Create a data reader of the FASTQ file format.

Arguments

  • input: data source
  • fill_ambiguous=nothing: fill ambiguous symbols with the given symbol
source
FASTQ.Writer(output::IO; quality_header=false)

Create a data writer of the FASTQ file format.

Arguments

  • output: data sink
  • quality_header=false: output the title line at the third line just after '+'
source

They can be created with IOStreams:

r = FASTQ.Reader(open("MyInput.fastq", "r"))
w = FASTQ.Writer(open("MyFile.fastq", "w"))

Note that FASTQ.Reader does not support line-wraps within sequence and quality. Usually sequence records will be read sequentially from a file by iteration.

using BioSequences
reader = FASTQ.Reader(open("hg38.fastq", "r"))
for record in reader
    # Do something
end
close(reader)

Reading in a record from a FASTQ formatted file will give you a variable of type FASTQ.Record.

FASTQ.Record()

Create an unfilled FASTQ record.

source
FASTQ.Record(data::Vector{UInt8})

Create a FASTQ record object from data.

This function verifies and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.

source
FASTQ.Record(str::AbstractString)

Create a FASTQ record object from str.

This function verifies and indexes fields for accessors.

source
FASTQ.Record(identifier, sequence, quality; offset=33)

Create a FASTQ record from identifier, sequence and quality.

source
FASTQ.Record(identifier, description, sequence, quality; offset=33)

Create a FASTQ record from identifier, description, sequence and quality.

source

Various getters and setters are available for FASTQ.Records:

hasidentifier(record::Record)

Checks whether or not the record has an identifier.

source
identifier(record::Record)::String

Get the sequence identifier of record.

source
hasdescription(record::Record)

Checks whether or not the record has a description.

source
description(record::Record)::String

Get the description of record.

source
hassequence(record::Record)

Checks whether or not a sequence record contains a sequence.

source
sequence(record::Record, [part::UnitRange{Int}])::BioSequences.DNASequence

Get the sequence of record.

source
hasquality(record::Record)

Check whether the given FASTQ record has a quality string.

source
quality(record::Record, [offset::Integer=33, [part::UnitRange]])::Vector{UInt8}

Get the base quality of record.

source
quality(record::Record, encoding_name::Symbol, [part::UnitRange])::Vector{UInt8}

Get the base quality of record by decoding with encoding_name.

The encoding_name can be either :sanger, :solexa, :illumina13, :illumina15, or :illumina18.

source

To write a BioSequence to FASTQ file, you first have to create a FASTQ.Record:

FASTQ.Record(identifier, description, sequence, quality; offset=33)

Create a FASTQ record from identifier, description, sequence and quality.

source

As always with julia IO types, remember to close your file readers and writer after you are finished.