Filtering: the @genes macro

A useful tool provided by GenomicAnnotations is the macro @genes. It is used to filter through annotations, for example to look at only at coding sequences or rRNAs, which can then be modified or iterated over:

# Print locus tags of all coding sequences longer than 1000 nt, that are not pseudo genes
for gene in @genes(chr, CDS, length(gene) > 1000, ! :pseudo)
    println(gene.locus_tag)
end
GenomicAnnotations.@genesMacro
@genes(chr, exs...)

Iterate over and evaluate expressions in exs for all genes in chr.genes, returning genes where all expressions evaluate to true. Any given symbol s in the expression will be substituted for gene.s. The gene itself can be accessed in the expression as gene. Accessing properties of the returned list of genes returns a view, which can be altered.

Some short-hand forms are available to make life easier: CDS, rRNA, and tRNA expand to feature(gene) == "...", get(s::Symbol, default) expands to get(gene, s, default)

Examples

julia> chromosome = readgbk("example.gbk")
Chromosome 'example' (5028 bp) with 6 annotations

julia> @genes(chromosome, CDS) |> length
3

julia> @genes(chromosome, length(gene) < 500)
     CDS             3..206
                     /db_xref="GI:1293614"
                     /locus_tag="tag01"
                     /codon_start="3"
                     /product="TCP1-beta"
                     /protein_id="AAA98665.1"

julia> @genes(chromosome, ismissing(:gene)) |> length
2

julia> @genes(chromosome, ismissing(:gene)).gene .= "Unknown";

julia> @genes(chromosome, ismissing(:gene)) |> length
0

All arguments have to evaluate to true for a gene to be included, so the following expressions are equivalent:

@genes(chr, CDS, length(gene) > 300)
@genes(chr, CDS && (length(gene) > 300))

@genes returns a Vector{Gene}. Attributes can be accessed with dot-syntax, and can be assigned to

@genes(chr, :locus_tag == "tag03")[1].pseudo = true
@genes(chr, CDS, ismissing(:gene)).gene .= "unknown"

Symbols and expressions escaped with $ will be ignored.

d = Dict(:category1 => ["tag01", "tag02"], :category2 => ["tag03"])
@genes(chr, :locus_tag in d[$:category1])

gene = chr.genes[5]
@genes(chr, gene == $gene)
source