BED
Description
BED is a text-based file format for representing genomic annotations like genes, transcripts, and so on. A BED file has tab-delimited and variable-length fields; the first three fields denoting a genomic interval are mandatory.
This is an example of RNA transcripts:
chr9 68331023 68424451 NM_015110 0 +
chr9 68456943 68486659 NM_001206 0 -
I/O tools for BED are provided from the GenomicFeatures.BED
module, which exports following three types:
- Reader type:
BED.Reader
- Writer type:
BED.Writer
- Element type:
BED.Record
Examples
Here is a common workflow to iterate over all records in a BED file:
# Import the BED module.
using GenomicFeatures
# Open a BED file.
reader = open(BED.Reader, "data.bed")
# Iterate over records.
for record in reader
# Do something on record (see Accessors section).
chrom = BED.chrom(record)
# ...
end
# Finally, close the reader.
close(reader)
If you repeatedly access records within specific ranges, it would be more efficient to construct an IntervalCollection
object from a BED reader:
# Create an interval collection in memory.
icol = open(BED.Reader, "data.bed") do reader
IntervalCollection(reader)
end
# Query overlapping records.
for interval in eachoverlap(icol, Interval("chrX", 40001, 51500))
# A record is stored in the metadata field of an interval.
record = metadata(interval)
# ...
end
API
GenomicFeatures.BED.Reader
— TypeBED.Reader(input::IO; index=nothing)
BED.Reader(input::AbstractString; index=:auto)
Create a data reader of the BED file format.
The first argument specifies the data source. When it is a filepath that ends with .bgz, it is considered to be block compression file format (BGZF) and the function will try to find a tabix index file (<filename>.tbi) and read it if any. See http://www.htslib.org/doc/tabix.html for bgzip and tabix tools.
Arguments
input
: data sourceindex
: path to a tabix file
GenomicFeatures.BED.Writer
— TypeBED.Writer(output::IO)
Create a data writer of the BED file format.
Arguments:
output
: data sink
GenomicFeatures.BED.Record
— TypeBED.Record()
Create an unfilled BED record.
BED.Record(data::Vector{UInt8})
Create a BED record object from data
.
This function verifies and indexes fields for accessors. Note that the ownership of data
is transferred to a new record object.
BED.Record(str::AbstractString)
Create a BED record object from str
.
This function verifies and indexes fields for accessors.
GenomicFeatures.BED.chrom
— Functionchrom(record::Record)::String
Get the chromosome name of record
.
GenomicFeatures.BED.chromstart
— Functionchromstart(record::Record)::Int
Get the starting position of record
.
Note that the first base is numbered 1.
GenomicFeatures.BED.chromend
— Functionchromend(record::Record)::Int
Get the end position of record
.
GenomicFeatures.BED.name
— Functionname(record::Record)::String
Get the name of record
.
GenomicFeatures.BED.score
— Functionscore(record::Record)::Int
Get the score between 0 and 1000.
GenomicFeatures.BED.strand
— Functionstrand(record::Record)::GenomicFeatures.Strand
Get the strand of record
.
GenomicFeatures.BED.thickstart
— Functionthickstart(record::Record)::Int
Get the starting position at which record
is drawn thickly.
Note that the first base is numbered 1.
GenomicFeatures.BED.thickend
— Functionthickend(record::Record)::Int
Get the end position at which record
is drawn thickly.
GenomicFeatures.BED.itemrgb
— Functionitemrgb(record::Record)::ColorTypes.RGB
Get the RGB value of record
.
The return type is defined in ColorTypes.jl.
GenomicFeatures.BED.blockcount
— Functionblockcount(record::Record)::Int
Get the number of blocks (exons) in record
.
GenomicFeatures.BED.blocksizes
— Functionblocksizes(record::Record)::Vector{Int}
Get the block (exon) sizes of record
.
GenomicFeatures.BED.blockstarts
— Functionblockstarts(record::Record)::Vector{Int}
Get the block (exon) starts of record
.
Note that the first base is numbered 1.