Filtering: the @genes macro
A useful tool provided by GenomicAnnotations is the macro @genes. It is used to filter through annotations, for example to look at only at coding sequences or rRNAs, which can then be modified or iterated over:
# Print locus tags of all coding sequences longer than 1000 nt, that are not pseudo genes
for gene in @genes(chr, CDS, length(gene) > 1000, ! :pseudo)
println(gene.locus_tag)
endGenomicAnnotations.@genes — Macro@genes(chr, exs...)Iterate over and evaluate expressions in exs for all genes in chr.genes, returning genes where all expressions evaluate to true. Any given symbol s in the expression will be substituted for gene.s. The gene itself can be accessed in the expression as gene. Accessing properties of the returned list of genes returns a view, which can be altered.
Some short-hand forms are available to make life easier: CDS, rRNA, and tRNA expand to feature(gene) == "...", get(s::Symbol, default) expands to get(gene, s, default)
Examples
julia> chromosome = readgbk("example.gbk")
Chromosome 'example' (5028 bp) with 6 annotations
julia> @genes(chromosome, CDS) |> length
3
julia> @genes(chromosome, length(gene) < 500)
CDS 3..206
/db_xref="GI:1293614"
/locus_tag="tag01"
/codon_start="3"
/product="TCP1-beta"
/protein_id="AAA98665.1"
julia> @genes(chromosome, ismissing(:gene)) |> length
2
julia> @genes(chromosome, ismissing(:gene)).gene .= "Unknown";
julia> @genes(chromosome, ismissing(:gene)) |> length
0All arguments have to evaluate to true for a gene to be included, so the following expressions are equivalent:
@genes(chr, CDS, length(gene) > 300)
@genes(chr, CDS && (length(gene) > 300))@genes returns a Vector{Gene}. Attributes can be accessed with dot-syntax, and can be assigned to
@genes(chr, :locus_tag == "tag03")[1].pseudo = true
@genes(chr, CDS, ismissing(:gene)).gene .= "unknown"Symbols and expressions escaped with $ will be ignored.
d = Dict(:category1 => ["tag01", "tag02"], :category2 => ["tag03"])
@genes(chr, :locus_tag in d[$:category1])
gene = chr.genes[5]
@genes(chr, gene == $gene)