BED
Description
BED is a text-based file format for representing genomic annotations like genes, transcripts, and so on. A BED file has tab-delimited and variable-length fields; the first three fields denoting a genomic interval are mandatory.
This is an example of RNA transcripts:
chr9 68331023 68424451 NM_015110 0 + chr9 68456943 68486659 NM_001206 0 -
I/O tools for BED are provided from the GenomicFeatures.BED
module, which exports following three types:
- Reader type:
BED.Reader
- Writer type:
BED.Writer
- Element type:
BED.Record
Examples
Here is a common workflow to iterate over all records in a BED file:
# Import the BED module. using GenomicFeatures # Open a BED file. reader = open(BED.Reader, "data.bed") # Iterate over records. for record in reader # Do something on record (see Accessors section). chrom = BED.chrom(record) # ... end # Finally, close the reader. close(reader)
If you repeatedly access records within specific ranges, it would be more efficient to construct an IntervalCollection
object from a BED reader:
# Create an interval collection in memory. icol = open(BED.Reader, "data.bed") do reader IntervalCollection(reader) end # Query overlapping records. for interval in eachoverlap(icol, Interval("chrX", 40001, 51500)) # A record is stored in the metadata field of an interval. record = metadata(interval) # ... end
API
#
GenomicFeatures.BED.Reader
— Type.
BED.Reader(input::IO; index=nothing) BED.Reader(input::AbstractString; index=:auto)
Create a data reader of the BED file format.
The first argument specifies the data source. When it is a filepath that ends with .bgz, it is considered to be block compression file format (BGZF) and the function will try to find a tabix index file (
Arguments
input
: data sourceindex
: path to a tabix file
#
GenomicFeatures.BED.Writer
— Type.
BED.Writer(output::IO)
Create a data writer of the BED file format.
Arguments:
output
: data sink
#
GenomicFeatures.BED.Record
— Type.
BED.Record()
Create an unfilled BED record.
BED.Record(data::Vector{UInt8})
Create a BED record object from data
.
This function verifies and indexes fields for accessors. Note that the ownership of data
is transferred to a new record object.
BED.Record(str::AbstractString)
Create a BED record object from str
.
This function verifies and indexes fields for accessors.
#
GenomicFeatures.BED.chrom
— Function.
chrom(record::Record)::String
Get the chromosome name of record
.
#
GenomicFeatures.BED.chromstart
— Function.
chromstart(record::Record)::Int
Get the starting position of record
.
Note that the first base is numbered 1.
#
GenomicFeatures.BED.chromend
— Function.
chromend(record::Record)::Int
Get the end position of record
.
#
GenomicFeatures.BED.name
— Function.
name(record::Record)::String
Get the name of record
.
#
GenomicFeatures.BED.score
— Function.
score(record::Record)::Int
Get the score between 0 and 1000.
#
GenomicFeatures.BED.strand
— Function.
strand(record::Record)::GenomicFeatures.Strand
Get the strand of record
.
#
GenomicFeatures.BED.thickstart
— Function.
thickstart(record::Record)::Int
Get the starting position at which record
is drawn thickly.
Note that the first base is numbered 1.
#
GenomicFeatures.BED.thickend
— Function.
thickend(record::Record)::Int
Get the end position at which record
is drawn thickly.
#
GenomicFeatures.BED.itemrgb
— Function.
itemrgb(record::Record)::ColorTypes.RGB
Get the RGB value of record
.
The return type is defined in ColorTypes.jl.
#
GenomicFeatures.BED.blockcount
— Function.
blockcount(record::Record)::Int
Get the number of blocks (exons) in record
.
#
GenomicFeatures.BED.blocksizes
— Function.
blocksizes(record::Record)::Vector{Int}
Get the block (exon) sizes of record
.
#
GenomicFeatures.BED.blockstarts
— Function.
blockstarts(record::Record)::Vector{Int}
Get the block (exon) starts of record
.
Note that the first base is numbered 1.