BGZF writers

Like BGZF readers, there is a BGZFWriter and a SyncBGZFWriter with the same tradeoffs as the two readers. See the "readers" section in the sidebar.

Constructing BGZF writers

Both BGZF writers conform to the AbstractBufWriter interface.

The buffers of the BGZF writers are fixed in size. Calling BufferIO.grow_buffer on them will perform a shallow flush instead of expanding the buffer, i.e. it will write its current content to the underlying IO.

The SyncBGZFWriter wraps an existing AbstractBufWriter. This inner writer must be able to present a buffer of at least 2^16 bytes, else a BGZFError(nothing, BGZFErrors.insufficient_writer_space) will be thrown. If the starting buffer size of the AbstractBufWriter is smaller than 2^16 bytes, BufferIO.grow_buffer will be called repeatedly on the underlying AbstractBufWriter.

When creating a SyncBGZFWriter from an T <: IO, a SyncBGZFWriter{BufWriter{T}} is created. Since BufWriter has an expanding buffer, it can always accomodate 2^16 bytes.

Mutating the wrapped io object of a BGZF reader or writer is not permitted and can cause erratic behaviour.

Writing BGZF files

Writers have an append_empty keyword that defaults to true. If set to true, closing the BGZF writer will write an empty BGZF, signaling EOF.

Should you want to write an empty block in the middle of the stream, ehe function write_empty_block can be used:

BGZFLib.write_empty_blockFunction
write_empty_block(io::Union{SyncBGZFWriter, BGZFWriter})

Perform a shallow_flush, then write an empty block.

Examples

Write the final empty EOF block manually:

julia> io = VecWriter();

julia> SyncBGZFWriter(io; append_empty=false) do writer
           write(writer, "Hello")
           # Manually write the empty EOF block
           write_empty_block(writer)
       end

julia> SyncBGZFReader(CursorReader(io.vec); check_truncated=true) do reader
           read(reader, String)
       end
"Hello"

Similar to BGZF readers (and Base.open), you can pass a function as the first argument to apply the function to the reader, then automatically close it:

io = VecWriter()

# Write data to `io` in BGZF format
BGZFWriter(io) do writer
    write(writer, "Hello, world!")
end

# Now read it back
SyncBGZFReader(CursorReader(io.vec)) do reader
    read(reader, String)
end

# output
"Hello, world!"

Reference

BGZFLib.BGZFWriterType
BGZFWriter(io::T <: AbstractBufWriter; kwargs)::BGZFWriter{T}
BGZFWriter(io::T <: IO; kwargs)::BGZFWriter{BufWriter{T}}

Create a SyncBGZFWriter <: AbstractBufWriter that writes compresses data written to it, and writes the compressed BGZF file to the underlying io.

This type differs from SyncBGZFWriter in that the compression happens in separate worker tasks. This allows BGZFWriter to compress in parallel, making it faster in the presence of multiple threads.

If io::AbstractBufWriter, io must be able to buffer up to 2^16 bytes, else a BGZFError(nothing, BGZFErrors.insufficient_writer_space) is thrown.

The keyword arguments are:

  • n_workers::Int: Set number of workers. Must be > 0. Defaults to some small number.
  • compress_level::Int: Set compression level from 1 to 12, with 12 being slowest but with the best compression ratio. It defaults to an intermediate level of compression.
  • append_empty::Bool = true. If set, closing the SyncBGZFWriter will write an empty BGZF block, indicating EOF.
BGZFLib.SyncBGZFWriterType
SyncBGZFWriter(io::T <: AbstractBufWriter; kwargs)::SyncBGZFWriter{T}
SyncBGZFWriter(io::T <: IO; kwargs)::SyncBGZFWriter{BufWriter{T}}

Create a SyncBGZFWriter <: AbstractBufWriter that writes compresses data written to it, and writes the compressed BGZF file to the underlying io.

This type differs from BGZFWriter in that it does the compression serial in the main task. Therefore it is slower when multiple threads are present, but does not incur Task- and scheduling overhead.

If io::AbstractBufWriter, io must be able to buffer up to 2^16 bytes, else a BGZFError(nothing, BGZFErrors.insufficient_writer_space) is thrown.

The keyword arguments are:

  • compresslevel::Int: Set compression level from 1 to 12, with 12 being slowest but with the best compression ratio. It defaults to an intermediate level of compression.
  • append_empty::Bool = true. If set, closing the SyncBGZFWriter will write an empty BGZF block, indicating EOF.