BED
BED is a text-based file format for representing genomic annotations like genes, transcripts, and so on. A BED file has tab-delimited and variable-length fields; the first three fields denoting a genomic interval are mandatory.
This is an example of RNA transcripts:
chr9 68331023 68424451 NM_015110 0 +
chr9 68456943 68486659 NM_001206 0 -
The BED
package supports I/O for BED by providing the following three types:
- Reader type:
BED.Reader
- Writer type:
BED.Writer
- Element type:
BED.Record
Examples
Here is a common workflow to iterate over all records in a BED file:
# Import the BED module.
using BED
# Open a BED file.
reader = open(BED.Reader, "data.bed")
# Iterate over records.
for record in reader
# Do something on record (see Accessors section).
chrom = BED.chrom(record)
# ...
end
# Finally, close the reader.
close(reader)
The iterator interface demonstrated above allocates an object for each record and that may be a bottleneck of reading data from a file. In-place reading reuses a pre-allocated object for every record and less memory allocation happens in reading:
# Import the BED module.
using BED
# Open a BED file.
reader = open(BED.Reader, "data.bed")
# Pre-allocate record.
record = BED.Record()
while !eof(reader)
empty!(record)
read!(reader, record)
# do something
end
# Finally, close the reader.
close(reader)
If you repeatedly access records within specific ranges, it would be more efficient to construct an IntervalCollection
object from a BED reader:
using BED
using GenomicFeatures
# Create an interval collection in memory.
icol = open(BED.Reader, "data.bed") do reader
IntervalCollection(reader)
end
# Query overlapping records.
for interval in eachoverlap(icol, Interval("chrX", 40001, 51500))
# A record is stored in the metadata field of an interval.
record = metadata(interval)
# ...
end