FASTA formatted files

FASTA formatted files

FASTA is a text-based file format for representing biological sequences. A FASTA file stores a list of sequence records with name, description, and sequence.

The template of a sequence record is:

>{name} {description}?
{sequence}

Here is an example of a chromosomal sequence:

>chrI chromosome 1
CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACC
CACACACACACATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTG

Readers and Writers

The reader and writer for FASTA formatted files, are found within the FASTA submodule of FASTX.

They can be created with IOStreams.

using FASTX

r = FASTA.Reader(open("my-seqs.fasta", "r"))
w = FASTA.Writer(open("my-out.fasta", "w"))

Alternatively, Base.open is overloaded with a method for conveinience:

r = open(FASTA.Reader, "my-seqs.fasta")
w = open(FASTA.Writer, "my-out.fasta")

Usually sequence records will be read sequentially from a file by iteration.

reader = open(FASTA.Reader, "my-seqs.fasta")
for record in reader
    ## Do something
end
close(reader)

You can also overwrite records in a while loop to avoid excessive memory allocation.

reader = open(FASTA.Reader, "my-seqs.fasta")
record = FASTA.Record()
while !eof(reader)
    read!(reader, record)
    ## Do something.
end

But if the FASTA file has an auxiliary index file formatted in fai, the reader supports random access to FASTA records, which would be useful when accessing specific parts of a huge genome sequence:

reader = open(FASTA.Reader, "sacCer.fa", index = "sacCer.fa.fai")
chrIV = reader["chrIV"]  # directly read sequences called chrIV.
close(reader)

Reading in a sequence from a FASTA formatted file will give you a variable of type FASTA.Record.

Various getters and setters are available for FASTA.Records:

To write a BioSequence to FASTA file, you first have to create a FASTA.Record:

using BioSequences
x = dna"aaaaatttttcccccggggg"
rec = FASTA.Record("MySeq", x)
w = open(FASTA.Writer, "my-out.fasta")
write(w, rec)
close(w)

As always with julia IO types, remember to close your file readers and writer after you are finished.

Using open with a do-block can help ensure you close a stream after you are finished.

open(FASTA.Reader, "my-reads.fasta") do reader
    for record in reader
        ## Do something
    end
end