Filtering: the @genes macro
A useful tool provided by GenomicAnnotations is the macro @genes
. It is used to filter through annotations, for example to look at only at coding sequences or rRNAs, which can then be modified or iterated over:
# Print locus tags of all coding sequences longer than 1000 nt, that are not pseudo genes
for gene in @genes(chr, CDS, length(gene) > 1000, ! :pseudo)
println(gene.locus_tag)
end
GenomicAnnotations.@genes
— Macro@genes(chr, exs...)
Iterate over and evaluate expressions in exs
for all genes in chr.genes
, returning genes where all expressions evaluate to true
. Any given symbol s
in the expression will be substituted for gene.s
. The gene itself can be accessed in the expression as gene
. Accessing properties of the returned list of genes returns a view, which can be altered.
Some short-hand forms are available to make life easier: CDS
, rRNA
, and tRNA
expand to feature(gene) == "..."
, get(s::Symbol, default)
expands to get(gene, s, default)
Examples
julia> chromosome = readgbk("example.gbk")
Chromosome 'example' (5028 bp) with 6 annotations
julia> @genes(chromosome, CDS) |> length
3
julia> @genes(chromosome, length(gene) < 500)
CDS 3..206
/db_xref="GI:1293614"
/locus_tag="tag01"
/codon_start="3"
/product="TCP1-beta"
/protein_id="AAA98665.1"
julia> @genes(chromosome, ismissing(:gene)) |> length
2
julia> @genes(chromosome, ismissing(:gene)).gene .= "Unknown";
julia> @genes(chromosome, ismissing(:gene)) |> length
0
All arguments have to evaluate to true
for a gene to be included, so the following expressions are equivalent:
@genes(chr, CDS, length(gene) > 300)
@genes(chr, CDS && (length(gene) > 300))
@genes
returns a Vector{Gene}
. Attributes can be accessed with dot-syntax, and can be assigned to
@genes(chr, :locus_tag == "tag03")[1].pseudo = true
@genes(chr, CDS, ismissing(:gene)).gene .= "unknown"
Symbols and expressions escaped with $
will be ignored.
d = Dict(:category1 => ["tag01", "tag02"], :category2 => ["tag03"])
@genes(chr, :locus_tag in d[$:category1])
gene = chr.genes[5]
@genes(chr, gene == $gene)