IO - 2bit formatted files
2bit is a binary file format designed for storing a genome consists of multiple chromosomal sequences. The reading speed is often an order of magnitude faster than that of FASTA and the file size is smaller. However, since the .2bit file format is specialized for genomic sequences, it cannot store either RNA or amino acid sequences.
Readers and Writers
The reader and writer for 2bit formatted files, are found within the BioSequences.TwoBit
module.
BioSequences.TwoBit.Reader
— Type.TwoBit.Reader(input::IO)
Create a data reader of the 2bit file format.
Arguments
input
: data source
BioSequences.TwoBit.Writer
— Type.TwoBitWriter(output::IO, names::AbstractVector)
Create a data writer of the 2bit file format.
Arguments
output
: data sinknames
: a vector of sequence names written tooutput
The 2bit reader supports random access using an index included in the header section of a .2bit file:
reader = TwoBit.Reader(open("sacCer.2bit", "r"))
chrIV = reader["chrIV"] # directly read chromosome 4
If you want to know the names of the sequences available in the file, you can use the seqnames
method on the reader.
seqnames(reader)
Reading from a TwoBit.Reader
will yield a TwoBit.Record
type variable:
BioSequences.TwoBit.Record
— Type.TwoBit.Record()
Create an unfilled 2bit record.
Record(name::AbstractString, seq::BioSequences.Sequence, masks::Union{Vector{UnitRange{Int}}, Nothing} = nothing)
Prepare a record for writing to a 2bit formatted file.
Needs a name
, a sequence
, and (optionally) masks
: a vector of ranges that delineate masked regions of sequence.
To write a sequence to a TwoBit file, first a record must be created.
BioSequences.TwoBit.Record
— Method.Record(name::AbstractString, seq::BioSequences.Sequence, masks::Union{Vector{UnitRange{Int}}, Nothing} = nothing)
Prepare a record for writing to a 2bit formatted file.
Needs a name
, a sequence
, and (optionally) masks
: a vector of ranges that delineate masked regions of sequence.