Accessing and modifying annotations

Features

The following functions can be used to read and modify the data associated with a gene:

GenomicAnnotations.locus!Function
locus!(gene::AbstractGene, loc)
locus!(gene::AbstractGene, loc::AbstractLocus)

Replace gene with a new Gene with loc as its Locus. If loc is not an AbstractLocus, it is parsed with Locus(loc).

source
GenomicAnnotations.feature!Function
feature!(g::Gene, f::Symbol)

Change the feature of g to f, returning a new instance of Gene. Since Genes are immutable, feature! only mutates the parent of g and not g itself. Thus, in the first example below the original unmodified g is printed, not the updated version:

# This will not work as expected:
for source in @genes(chr, source)
    feature!(source, :region)
    println(source)
end

# But this will:
for source in @genes(chr, source)
    source = feature!(source, :region)
    println(source)
source
Base.parentMethod
parent(g::Gene)
parent(gs::AbstractVector{Gene})

Return the parent Record of g. Errors for AbstractVector{Gene}s if the genes do not come from the same parent.

source
GenomicAnnotations.attributesFunction
attributes(g::Gene)

Return an immutable NamedTuple containing copies of all annotated attributes of g. Missing attributes are excluded. See genedata for a non-allocating way to access the gene data directly.

source

Features (genes) can be added using addgene!. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section).

GenomicAnnotations.addgene!Function
addgene!(chr::Record, feature, locus; kw...)

Add gene to chr. locus can be an AbstractLocus, a String, a UnitRange, or a StepRange (for decreasing ranges, which will be annotated on the complementary strand).

Example

addgene!(chr, "CDS", 1:756;
    locus_tag = "gene0001",
    product = "Chromosomal replication initiator protein dnaA")
source

After adding a new feature, sort!(chr) can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome.

Existing features can be removed using delete!:

Base.delete!Method
delete!(gene::AbstractGene)

Delete gene from parent(gene). Warning: does not work when broadcasted! Use delete!(::AbstractVector{Gene}) instead.

source
Base.delete!Method
delete!(genes::AbstractArray{Gene, 1})

Delete all genes in genes from parent(genes[1]).

Example

delete!(@genes(chr, length(gene) <= 60))
source

Qualifiers

Features can have multiple attributes/qualifiers, which can be modified using Julia's property syntax:

# Remove newspace from gene product descriptions
for gene in @genes(chr, CDS)
    replace!(gene.product, '\n' => ' ')
end

Properties also work on views of genes, typically generated using @genes:

interestinggenes = readlines("/path/to/list/of/interesting/genes.txt")
@genes(chr, CDS, :locus_tag in interestinggenes).interesting .= true

Sometimes features have multiple instances of the same qualifier, such genes having several EC-numbers. Assigning qualifiers with property syntax overwrites any data that was previously stored for that feature, and trying to assign a vector of values to a qualifier that is currently storing scalars will result in an error, so to safely assign qualifiers that might have more instances one can use pushproperty!:

GenomicAnnotations.pushproperty!Function
pushproperty!(gene::AbstractGene, qualifier::Symbol, value::T)

Add a property to gene, similarly to Base.setproperty!(::gene), but if the property is not missing in gene, it will be transformed to store a vector instead of overwriting existing data.

julia> eltype(chr.genedata[!, :EC_number])
Union{Missing,String}

julia> chr.genes[1].EC_number = "EC:1.2.3.4"
"EC:1.2.3.4"

julia> pushproperty!(chr.genes[1], :EC_number, "EC:4.3.2.1"); chr.genes[1].EC_number
2-element Array{String,1}:
 "EC:1.2.3.4"
 "EC:4.3.2.1"

julia> eltype(chr.genedata[!, :EC_number])
Union{Missing, Array{String,1}}
source

Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use get() than to access the property directly.

# chr.genes[2].pseudo returns missing, so this will throw an error
if chr.genes[2].pseudo
    println("Gene 2 is a pseudogene")
end

# ... but this works:
if get(chr.genes[2], :pseudo, false)
    println("Gene 2 is a pseudogene")
end

Sequences

The sequence of a Chromosome chr is stored in chr.sequence. Sequences of individual features can be read with sequence:

GenomicAnnotations.sequenceMethod
sequence(gene::AbstractGene; translate = false, preserve_alternate_start = false)

Return genomic sequence for gene. If translate is true, the sequence will be translated to a LongAA, excluding the stop, otherwise it will be returned as a LongDNA{4} (including the stop codon). If preserve_alternate_start is set to false, alternate start codons will be assumed to code for methionine. ```

source