Reference

BGZFLib.BGZFErrorType
BGZFError <: Exception

Exception type thrown by BGZF readers and writers, when encountering errors specific to the BGZF (or gzip, or DEFLATE) formats. Note that exceptions thrown by BGZF readers and writers are not guaranteed to be of this type, as they may also throw BufferIO.IOErrors, or exceptions propagated by their underlying IO.

This error contains two public properties:

  • block_offset::Union{Nothing, Int} gives the zero-based offset in the compressed stream of the block where the error occurred. Some errors may not occur at a specific block, in which case this is nothing.
  • type::Union{BGZFErrorType, LibDeflateError}. If the blocks are malformed gzip blocks, this is a LibDeflateError. Else, if the error is specific to the BGZF format, it's a BGZFErrorType.
BGZFLib.BGZFReaderType
BGZFReader(io::T <: IO; n_workers::Int, check_truncated::Bool=true)::BGZFReader{BufReader{T}}
BGZFReader(io::T <: AbstractBufReader; n_workers::Int, check_truncated::Bool=true)::BGZFReader{T}

Create a BGZFReader <: AbstractBufReader that decompresses a BGZF stream.

When constructing from an io::AbstractBufReader, io must have a buffer size of at least 65536, or be able to grow its buffer to this size.

If check_truncated, the last BGZF block in the file must be empty, otherwise the reader throws an error. This can be used to detect the file was truncated.

The decompression happens asyncronously in a set of worker tasks. To avoid spawning workers, use the SyncBGZFReader instead.

If the reader encounters an error, it goes into an error state and throws an exception. The reader can be reset by using seek or seekstart. A closed reader cannot be reset.

BGZFLib.BGZFWriterType
BGZFWriter(io::T <: AbstractBufWriter; kwargs)::BGZFWriter{T}
BGZFWriter(io::T <: IO; kwargs)::BGZFWriter{BufWriter{T}}

Create a SyncBGZFWriter <: AbstractBufWriter that writes compresses data written to it, and writes the compressed BGZF file to the underlying io.

This type differs from SyncBGZFWriter in that the compression happens in separate worker tasks. This allows BGZFWriter to compress in parallel, making it faster in the presence of multiple threads.

If io::AbstractBufWriter, io must be able to buffer up to 2^16 bytes, else a BGZFError(nothing, BGZFErrors.insufficient_writer_space) is thrown.

The keyword arguments are:

  • n_workers::Int: Set number of workers. Must be > 0. Defaults to some small number.
  • compress_level::Int: Set compression level from 1 to 12, with 12 being slowest but with the best compression ratio. It defaults to an intermediate level of compression.
  • append_empty::Bool = true. If set, closing the SyncBGZFWriter will write an empty BGZF block, indicating EOF.
BGZFLib.GZIndexType
GZIndex(blocks::Vector{@NamedTuple{compressed_offset::UInt64, decompressed_offset::UInt64}})

Construct a GZI index of a BGZF file. The vector blocks contains one pair of integers for each block in the BGZF file, in order, containing the zero-based offset of the compressed data and the corresponding decompressed data, respectively.

Throw a BGZFError(nothing, BGZFErrors.unsorted_index) if either of the offsets are not sorted in ascending order.

Usually constructed with index_bgzf, or load_gzi and serialized with write(io, ::GZIndex).

This struct contains the public property .blocks which corresponds to the vector as described above, no matter how GZIndex is constructed.

See also: index_bgzf, load_gzi, write_gzi

BGZFLib.SyncBGZFReaderType
SyncBGZFReader(io::T <: IO; check_truncated::Bool=true)::SyncBGZFReader{BufReader{T}}
SyncBGZFReader(io::T <: AbstractBufReader; check_truncated::Bool=true)::SyncBGZFReader{T}

Create a SyncBGZFReader <: AbstractBufReader that decompresses BGZF files.

When constructing from an io::AbstractBufReader, io must have a buffer size of at least 65536, or be able to grow its buffer to this size.

If check_truncated, the last BGZF block in the file must be empty, otherwise the reader throws an error. This can be used to detect the file was truncated.

Unlike BGZFReader, the decompression happens in in serial in the main task. This is slower and does not enable paralellism, but may be preferable in situations where task scheduling or contention is an issue.

If the reader encounters an error, it goes into an error state and throws an exception. The reader can be reset by using seek or seekstart. A closed reader cannot be reset.

BGZFLib.SyncBGZFWriterType
SyncBGZFWriter(io::T <: AbstractBufWriter; kwargs)::SyncBGZFWriter{T}
SyncBGZFWriter(io::T <: IO; kwargs)::SyncBGZFWriter{BufWriter{T}}

Create a SyncBGZFWriter <: AbstractBufWriter that writes compresses data written to it, and writes the compressed BGZF file to the underlying io.

This type differs from BGZFWriter in that it does the compression serial in the main task. Therefore it is slower when multiple threads are present, but does not incur Task- and scheduling overhead.

If io::AbstractBufWriter, io must be able to buffer up to 2^16 bytes, else a BGZFError(nothing, BGZFErrors.insufficient_writer_space) is thrown.

The keyword arguments are:

  • compresslevel::Int: Set compression level from 1 to 12, with 12 being slowest but with the best compression ratio. It defaults to an intermediate level of compression.
  • append_empty::Bool = true. If set, closing the SyncBGZFWriter will write an empty BGZF block, indicating EOF.
BGZFLib.VirtualOffsetType
VirtualOffset(file_offset::Integer, block_offset::Integer)

Create a BGZF virtual file offset from file_offset and block_offset. Get the two offsets with the public properties vo.file_offset and vo.block_offset

A VirtualOffset contains the two zero-indexed offset: The "file offset", which is the offset in the compressed BGZF file that marks the beginning of the block with the given position, and an "block offset" which is the offset of the uncompressed content of that block.

The valid ranges of these two are 0:2^48-1 and 0:2^16-1, respectively.

Examples

julia> reader = SyncBGZFReader(CursorReader(bgzf_data));

julia> vo = VirtualOffset(178, 5)
VirtualOffset(178, 5)

julia> virtual_seek(reader, vo);

julia> String(read(reader, 9))
"some more"

julia> virtual_seek(reader, VirtualOffset(0, 7));

julia> String(read(reader, 6))
"world!"
BGZFLib.get_virtual_offsetMethod
get_virtual_offset(gzi::GZIndex, offset::Int)::Union{Nothing, VirtualOffset}

Get the VirtualOffset that corresponds to the zero-based offset offset in the decompressed BGZF stream indexed by gzi.

Return nothing if offset is smaller than zero, or points more than 2^16 bytes beyond the start of the final block.

Note that, because gzi files (and thus GZIndex) do not store the length of the final block, the resulting VirtualOffset may be invalid. Specifically, if the resulting VirtualOffset points bo ≤2^16bytes into the final block, but the final block is less thanbobytes, this function will return aVirtualOffset`, but using that offset to seek in the corresponding BGZF stream will error.

Examples

julia> gzi = load_gzi(CursorReader(gzi_data));

julia> get_virtual_offset(gzi, 100_000) === nothing
true

julia> vo = get_virtual_offset(gzi, 45)
VirtualOffset(223, 8)

julia> reader = virtual_seek(SyncBGZFReader(CursorReader(bgzf_data)), vo);

julia> read(reader) |> String
"tent herethis is another block"

julia> bad_vo = get_virtual_offset(gzi, 500)
VirtualOffset(323, 425)

julia> virtual_seek(reader, bad_vo);
ERROR: BGZFError: Error in block at offset 323: Seek to block offset larger than block size
[...]

julia> close(reader)
BGZFLib.index_bgzfMethod
index_bgzf(io::Union{IO, AbstractBufReader})::GZIndex

Compute a GZIndex from a BGZF file.

Throw a BGZFError if the BGZF file is invalid, or a BGZFError with BGZFErrors.insufficient_reader_space if an entire block cannot be buffered by io, (only happens if io::AbstractBufReader).

Indexing the file does not attempt to decompress it, and therefore does not validate that the compressed data is valid (i.e. is a valid DEFLATE payload, or that the crc32 checksum matches).

See also: load_gzi, GZIndex, write_gzi

Examples

julia> idx1 = open(index_bgzf, path_to_bgzf);

julia> idx2 = open(load_gzi, path_to_gzi);

julia> idx1.blocks == idx2.blocks
true
BGZFLib.load_gziMethod
load_gzi(io::Union{IO, AbstractBufReader})::GZIndex

Load a GZIndex from a GZI file.

Throw an IOError(IOErrorKinds.EOF) if io does not contain enough bytes for a valid GZI file. Throw a BGZFError(nothing, BGZFErrors.unsorted_index) if the offsets are not sorted in ascending order. Currently does not throw an error if the file contains extra appended bytes, but this may change in the future.

See also: index_bgzf, GZIndex, write_gzi

Examples

julia> gzi = open(load_gzi, path_to_gzi);

julia> gzi isa GZIndex
true

julia> (; compressed_offset) = gzi.blocks[5]
(compressed_offset = 0x0000000000000093, decompressed_offset = 0x0000000000000017)

julia> reader = SyncBGZFReader(CursorReader(bgzf_data));

julia> seek(reader, Int(compressed_offset));

julia> read(reader, 15) |> String
"then some morem"

julia> close(reader)
BGZFLib.virtual_positionMethod
virtual_position(io::Union{SyncBGZFReader, BGZFReader})::VirtualOffset

Get the VirtualOffset of the current BGZF reader. The virtual offset is a position in the decompressed stream. Seek to the position using virtual_seek.

See also: VirtualOffset, virtual_seek

Examples

julia> reader = SyncBGZFReader(CursorReader(bgzf_data));

julia> virtual_position(reader)
VirtualOffset(0, 0)

julia> read(reader, 18);

julia> virtual_position(reader)
VirtualOffset(44, 5)

julia> close(reader)
BGZFLib.virtual_seekMethod
virtual_seek(io::Union{SyncBGZFReader, BGZFReader}, vo::VirtualOffset) -> io

Seek to the virtual position vo. The virtual position is usually obtained by a call to virtual_position.

See also: VirtualOffset, virtual_position

julia> reader = SyncBGZFReader(CursorReader(bgzf_data));

julia> virtual_seek(reader, VirtualOffset(178, 14));

julia> String(read(reader))
"more content herethis is another block"

julia> virtual_seek(reader, VirtualOffset(0, 0));

julia> String(read(reader, 13))
"Hello, world!"

julia> close(reader)
BGZFLib.write_empty_blockMethod
write_empty_block(io::Union{SyncBGZFWriter, BGZFWriter})

Perform a shallow_flush, then write an empty block.

Examples

Write the final empty EOF block manually:

julia> io = VecWriter();

julia> SyncBGZFWriter(io; append_empty=false) do writer
           write(writer, "Hello")
           # Manually write the empty EOF block
           write_empty_block(writer)
       end

julia> SyncBGZFReader(CursorReader(io.vec); check_truncated=true) do reader
           read(reader, String)
       end
"Hello"
BGZFLib.write_gziMethod
write_gzi(io::Union{AbstractBufWriter, IO}, index::GZIndex)::Int

Write a GZIndex to io in GZI format, and return the number of written bytes. Currently, this function only works on little-endian CPUs, and will throw an ErrorException on big-endian platforms.

The resulting file can be loaded with load_gzi and obtain an index equivalent to index.

See also: GZIndex, index_bgzf

Examples

julia> gzi = load_gzi(CursorReader(gzi_data))::GZIndex;

julia> io = VecWriter();

julia> write_gzi(io, gzi)
152

julia> gzi_2 = load_gzi(CursorReader(io.vec));

julia> gzi.blocks == gzi_2.blocks
true
BGZFLib.BGZFErrorsModule
module BGZFErrors

This module is used as a namespace for the enum BGZFErrorType. The enum is non-exhaustive (more variants may be added in the future). The current values are:

  • truncated_file: The reader data stops abruptly. Either in the middle of a block, or there is no empty block at EOF
  • missing_bc_field: A block has no BC field, or it's malformed
  • block_offset_out_of_bounds: Seek with a VirtualOffset where the block offset is larger than the block size
  • insufficient_reader_space: The BGZF reader wraps an AbstractBufWriter that is not EOF, and its buffer can't grow to encompass a whole BGZF block
  • insufficient_writer_space: A BGZF writer wraps an AbstractBufWriter whose buffer cannot grow to encompass a full BGZF block
  • unsorted_index: Attempted to load a malformed GZI file with unsorted coordinates, or with a file index > 2^48, or with a block size > 2^16.
  • operation_on_error: Attempted an operation on a BGZF reader or writer in an error state.