Reference
BGZFLib.BGZFError — Type
BGZFError <: ExceptionException type thrown by BGZF readers and writers, when encountering errors specific to the BGZF (or gzip, or DEFLATE) formats. Note that exceptions thrown by BGZF readers and writers are not guaranteed to be of this type, as they may also throw BufferIO.IOErrors, or exceptions propagated by their underlying IO.
This error contains two public properties:
block_offset::Union{Nothing, Int}gives the zero-based offset in the compressed stream of the block where the error occurred. Some errors may not occur at a specific block, in which case this isnothing.type::Union{BGZFErrorType, LibDeflateError}. If the blocks are malformed gzip blocks, this is aLibDeflateError. Else, if the error is specific to the BGZF format, it's a BGZFErrorType.
BGZFLib.BGZFReader — Type
BGZFReader(io::T <: IO; n_workers::Int, check_truncated::Bool=true)::BGZFReader{BufReader{T}}
BGZFReader(io::T <: AbstractBufReader; n_workers::Int, check_truncated::Bool=true)::BGZFReader{T}Create a BGZFReader <: AbstractBufReader that decompresses a BGZF stream.
When constructing from an io::AbstractBufReader, io must have a buffer size of at least 65536, or be able to grow its buffer to this size.
If check_truncated, the last BGZF block in the file must be empty, otherwise the reader throws an error. This can be used to detect the file was truncated.
The decompression happens asyncronously in a set of worker tasks. To avoid spawning workers, use the SyncBGZFReader instead.
If the reader encounters an error, it goes into an error state and throws an exception. The reader can be reset by using seek or seekstart. A closed reader cannot be reset.
BGZFLib.BGZFWriter — Type
BGZFWriter(io::T <: AbstractBufWriter; kwargs)::BGZFWriter{T}
BGZFWriter(io::T <: IO; kwargs)::BGZFWriter{BufWriter{T}}Create a SyncBGZFWriter <: AbstractBufWriter that writes compresses data written to it, and writes the compressed BGZF file to the underlying io.
This type differs from SyncBGZFWriter in that the compression happens in separate worker tasks. This allows BGZFWriter to compress in parallel, making it faster in the presence of multiple threads.
If io::AbstractBufWriter, io must be able to buffer up to 2^16 bytes, else a BGZFError(nothing, BGZFErrors.insufficient_writer_space) is thrown.
The keyword arguments are:
n_workers::Int: Set number of workers. Must be > 0. Defaults to some small number.compress_level::Int: Set compression level from 1 to 12, with 12 being slowest but with the best compression ratio. It defaults to an intermediate level of compression.append_empty::Bool = true. If set, closing theSyncBGZFWriterwill write an empty BGZF block, indicating EOF.
BGZFLib.GZIndex — Type
GZIndex(blocks::Vector{@NamedTuple{compressed_offset::UInt64, decompressed_offset::UInt64}})Construct a GZI index of a BGZF file. The vector blocks contains one pair of integers for each block in the BGZF file, in order, containing the zero-based offset of the compressed data and the corresponding decompressed data, respectively.
Throw a BGZFError(nothing, BGZFErrors.unsorted_index) if either of the offsets are not sorted in ascending order.
Usually constructed with index_bgzf, or load_gzi and serialized with write(io, ::GZIndex).
This struct contains the public property .blocks which corresponds to the vector as described above, no matter how GZIndex is constructed.
See also: index_bgzf, load_gzi, write_gzi
BGZFLib.SyncBGZFReader — Type
SyncBGZFReader(io::T <: IO; check_truncated::Bool=true)::SyncBGZFReader{BufReader{T}}
SyncBGZFReader(io::T <: AbstractBufReader; check_truncated::Bool=true)::SyncBGZFReader{T}Create a SyncBGZFReader <: AbstractBufReader that decompresses BGZF files.
When constructing from an io::AbstractBufReader, io must have a buffer size of at least 65536, or be able to grow its buffer to this size.
If check_truncated, the last BGZF block in the file must be empty, otherwise the reader throws an error. This can be used to detect the file was truncated.
Unlike BGZFReader, the decompression happens in in serial in the main task. This is slower and does not enable paralellism, but may be preferable in situations where task scheduling or contention is an issue.
If the reader encounters an error, it goes into an error state and throws an exception. The reader can be reset by using seek or seekstart. A closed reader cannot be reset.
BGZFLib.SyncBGZFWriter — Type
SyncBGZFWriter(io::T <: AbstractBufWriter; kwargs)::SyncBGZFWriter{T}
SyncBGZFWriter(io::T <: IO; kwargs)::SyncBGZFWriter{BufWriter{T}}Create a SyncBGZFWriter <: AbstractBufWriter that writes compresses data written to it, and writes the compressed BGZF file to the underlying io.
This type differs from BGZFWriter in that it does the compression serial in the main task. Therefore it is slower when multiple threads are present, but does not incur Task- and scheduling overhead.
If io::AbstractBufWriter, io must be able to buffer up to 2^16 bytes, else a BGZFError(nothing, BGZFErrors.insufficient_writer_space) is thrown.
The keyword arguments are:
compresslevel::Int: Set compression level from 1 to 12, with 12 being slowest but with the best compression ratio. It defaults to an intermediate level of compression.append_empty::Bool = true. If set, closing theSyncBGZFWriterwill write an empty BGZF block, indicating EOF.
BGZFLib.VirtualOffset — Type
VirtualOffset(file_offset::Integer, block_offset::Integer)Create a BGZF virtual file offset from file_offset and block_offset. Get the two offsets with the public properties vo.file_offset and vo.block_offset
A VirtualOffset contains the two zero-indexed offset: The "file offset", which is the offset in the compressed BGZF file that marks the beginning of the block with the given position, and an "block offset" which is the offset of the uncompressed content of that block.
The valid ranges of these two are 0:2^48-1 and 0:2^16-1, respectively.
Examples
julia> reader = SyncBGZFReader(CursorReader(bgzf_data));
julia> vo = VirtualOffset(178, 5)
VirtualOffset(178, 5)
julia> virtual_seek(reader, vo);
julia> String(read(reader, 9))
"some more"
julia> virtual_seek(reader, VirtualOffset(0, 7));
julia> String(read(reader, 6))
"world!"BGZFLib.get_virtual_offset — Method
get_virtual_offset(gzi::GZIndex, offset::Int)::Union{Nothing, VirtualOffset}Get the VirtualOffset that corresponds to the zero-based offset offset in the decompressed BGZF stream indexed by gzi.
Return nothing if offset is smaller than zero, or points more than 2^16 bytes beyond the start of the final block.
Note that, because gzi files (and thus GZIndex) do not store the length of the final block, the resulting VirtualOffset may be invalid. Specifically, if the resulting VirtualOffset points bo ≤2^16bytes into the final block, but the final block is less thanbobytes, this function will return aVirtualOffset`, but using that offset to seek in the corresponding BGZF stream will error.
Examples
julia> gzi = load_gzi(CursorReader(gzi_data));
julia> get_virtual_offset(gzi, 100_000) === nothing
true
julia> vo = get_virtual_offset(gzi, 45)
VirtualOffset(223, 8)
julia> reader = virtual_seek(SyncBGZFReader(CursorReader(bgzf_data)), vo);
julia> read(reader) |> String
"tent herethis is another block"
julia> bad_vo = get_virtual_offset(gzi, 500)
VirtualOffset(323, 425)
julia> virtual_seek(reader, bad_vo);
ERROR: BGZFError: Error in block at offset 323: Seek to block offset larger than block size
[...]
julia> close(reader)BGZFLib.index_bgzf — Method
index_bgzf(io::Union{IO, AbstractBufReader})::GZIndexCompute a GZIndex from a BGZF file.
Throw a BGZFError if the BGZF file is invalid, or a BGZFError with BGZFErrors.insufficient_reader_space if an entire block cannot be buffered by io, (only happens if io::AbstractBufReader).
Indexing the file does not attempt to decompress it, and therefore does not validate that the compressed data is valid (i.e. is a valid DEFLATE payload, or that the crc32 checksum matches).
See also: load_gzi, GZIndex, write_gzi
Examples
julia> idx1 = open(index_bgzf, path_to_bgzf);
julia> idx2 = open(load_gzi, path_to_gzi);
julia> idx1.blocks == idx2.blocks
trueBGZFLib.load_gzi — Method
load_gzi(io::Union{IO, AbstractBufReader})::GZIndexLoad a GZIndex from a GZI file.
Throw an IOError(IOErrorKinds.EOF) if io does not contain enough bytes for a valid GZI file. Throw a BGZFError(nothing, BGZFErrors.unsorted_index) if the offsets are not sorted in ascending order. Currently does not throw an error if the file contains extra appended bytes, but this may change in the future.
See also: index_bgzf, GZIndex, write_gzi
Examples
julia> gzi = open(load_gzi, path_to_gzi);
julia> gzi isa GZIndex
true
julia> (; compressed_offset) = gzi.blocks[5]
(compressed_offset = 0x0000000000000093, decompressed_offset = 0x0000000000000017)
julia> reader = SyncBGZFReader(CursorReader(bgzf_data));
julia> seek(reader, Int(compressed_offset));
julia> read(reader, 15) |> String
"then some morem"
julia> close(reader)BGZFLib.virtual_position — Method
virtual_position(io::Union{SyncBGZFReader, BGZFReader})::VirtualOffsetGet the VirtualOffset of the current BGZF reader. The virtual offset is a position in the decompressed stream. Seek to the position using virtual_seek.
See also: VirtualOffset, virtual_seek
Examples
julia> reader = SyncBGZFReader(CursorReader(bgzf_data));
julia> virtual_position(reader)
VirtualOffset(0, 0)
julia> read(reader, 18);
julia> virtual_position(reader)
VirtualOffset(44, 5)
julia> close(reader)BGZFLib.virtual_seek — Method
virtual_seek(io::Union{SyncBGZFReader, BGZFReader}, vo::VirtualOffset) -> ioSeek to the virtual position vo. The virtual position is usually obtained by a call to virtual_position.
See also: VirtualOffset, virtual_position
julia> reader = SyncBGZFReader(CursorReader(bgzf_data));
julia> virtual_seek(reader, VirtualOffset(178, 14));
julia> String(read(reader))
"more content herethis is another block"
julia> virtual_seek(reader, VirtualOffset(0, 0));
julia> String(read(reader, 13))
"Hello, world!"
julia> close(reader)BGZFLib.write_empty_block — Method
write_empty_block(io::Union{SyncBGZFWriter, BGZFWriter})Perform a shallow_flush, then write an empty block.
Examples
Write the final empty EOF block manually:
julia> io = VecWriter();
julia> SyncBGZFWriter(io; append_empty=false) do writer
write(writer, "Hello")
# Manually write the empty EOF block
write_empty_block(writer)
end
julia> SyncBGZFReader(CursorReader(io.vec); check_truncated=true) do reader
read(reader, String)
end
"Hello"BGZFLib.write_gzi — Method
write_gzi(io::Union{AbstractBufWriter, IO}, index::GZIndex)::IntWrite a GZIndex to io in GZI format, and return the number of written bytes. Currently, this function only works on little-endian CPUs, and will throw an ErrorException on big-endian platforms.
The resulting file can be loaded with load_gzi and obtain an index equivalent to index.
See also: GZIndex, index_bgzf
Examples
julia> gzi = load_gzi(CursorReader(gzi_data))::GZIndex;
julia> io = VecWriter();
julia> write_gzi(io, gzi)
152
julia> gzi_2 = load_gzi(CursorReader(io.vec));
julia> gzi.blocks == gzi_2.blocks
trueBGZFLib.BGZFErrors — Module
module BGZFErrorsThis module is used as a namespace for the enum BGZFErrorType. The enum is non-exhaustive (more variants may be added in the future). The current values are:
truncated_file: The reader data stops abruptly. Either in the middle of a block, or there is no empty block at EOFmissing_bc_field: A block has noBCfield, or it's malformedblock_offset_out_of_bounds: Seek with aVirtualOffsetwhere the block offset is larger than the block sizeinsufficient_reader_space: The BGZF reader wraps anAbstractBufWriterthat is not EOF, and its buffer can't grow to encompass a whole BGZF blockinsufficient_writer_space: A BGZF writer wraps anAbstractBufWriterwhose buffer cannot grow to encompass a full BGZF blockunsorted_index: Attempted to load a malformed GZI file with unsorted coordinates, or with a file index > 2^48, or with a block size > 2^16.operation_on_error: Attempted an operation on a BGZF reader or writer in an error state.