FASTQ formatted files

FASTQ formatted files

FASTQ is a text-based file format for representing DNA sequences along with qualities for each base. A FASTQ file stores a list of sequence records in the following format:

@{name} {description}?
{sequence}
+
{qualities}

Here is an example of one record from a FASTQ file:

@FSRRS4401BE7HA
tcagTTAAGATGGGAT
+
###EEEEEEEEE##E#

Readers and Writers

The reader and writer for FASTQ formatted files, are found within the FASTQ module of FASTX.

They can be created with IOStreams:

using FASTX

r = FASTQ.Reader(open("my-reads.fastq", "r"))
w = FASTQ.Writer(open("my-output.fastq", "w"))

Alternatively, Base.open is overloaded with a method for conveinience:

r = open(FASTQ.Reader, "my-reads.fastq")
w = open(FASTQ.Writer, "my-out.fastq")

Note that FASTQ.Reader does not support line-wraps within sequence and quality. Usually sequence records will be read sequentially from a file by iteration.

reader = open(FASTQ.Reader, "my-reads.fastq")
for record in reader
    ## Do something
end
close(reader)

You can also overwrite records in a while loop to avoid excessive memory allocation.

reader = open(FASTQ.Reader, "my-reads.fastq")
record = FASTQ.Record()
while !eof(reader)
    read!(reader, record)
    ## Do something.
end

Reading in a record from a FASTQ formatted file will give you a variable of type FASTQ.Record.

Various getters and setters are available for FASTQ.Records:

To write a BioSequence to FASTQ file, you first have to create a FASTQ.Record:

As always with julia IO types, remember to close your file readers and writer after you are finished.

Using open with a do-block can help ensure you close a stream after you are finished.

open(FASTQ.Reader, "my-reads.fastq") do reader
    for record in reader
        ## Do something
    end
end

Quality encodings

FASTQ records have a quality string which have platform dependent encodings. The FASTQ submodule has encoding and decoding support for the following quality encodings. These can be used with a FASTQ.quality method, to ensure the correct quality score values are extracted from your FASTQ quality strings.