GGGenome
GGGenome is a ultrafast DNA sequence search service hosted by Database Center for Life Science (DBCLS). See "GGGenome Help" (https://gggenome.dbcls.jp/en/help.html) for more details.
BioServices.GGGenome is a Julia module interfaces with the GGGenome REST API to query a DNA sequence to various databases programmatically.
Getting Started
Import module:
using BioServices.GGGenome
Method's documentation
Retrieve results of gggenome search of a query sequence.
#
BioServices.GGGenome.gggsearch
— Method.
gggsearch(query::AbstractString; db="hg19", k=0, strand=nothing, format="html", timeout=5, output=nothing, show_url=false)
Retrieve results of gggenome search of a query sequence.
Arguments
Required
query::String
: Nucleotide sequence, case insensitive.
Optional
db::String
: Target database name. hg19 if not specified. Full list of databases: https://gggenome.dbcls.jp/en/help.html#db_listk::Integer
: Maximum number of mismatches/gaps. 0 if not specified.strand::String
: '+' ('plus') or '-' ('minus') to search specified strand only.format::String
: [html|txt|csv|bed|gff|json]. html if not specified.timeout::Real
: Maximum time allowed for a query.output::String
: If "toString", aString
object is returned. If "extractTopHit", aString
object containing only top hit is returned (Currently, only works with format="txt"). Otherwise, aHTTP.Messages.Response
object is returned.show_url::Bool
: If true, print URL of REST API.
Retrieve full list of available databases
#
BioServices.GGGenome.gggdbs
— Method.
gggdbs()
Retrieve full list of available databases. Full list of databases: https://gggenome.dbcls.jp/en/help.html#db_list.
Available Databases
Genome sequences (hg19
, mm10
, dm3
, ce10
, TAIR10
, pombe
, etc.) and other sequence databases (e.g., refseq
) are available.
Full list of available databases is https://gggenome.dbcls.jp/en/mm10/help.html#db_list.
Examples
Example 1
- Search TTCATTGACAACATT in
- human genome hg19 (default),
- with perfect matches (default),
- in json format.
julia> res = gggsearch("TTCATTGACAACATT", format="bed", output="toString"); julia> print(res) track name=GGGenome description="GGGenome matches" chr1 83462475 83462490 . 0 + chr2 161223114 161223129 . 0 + chr3 15289789 15289804 . 0 + chr3 84619844 84619859 . 0 + ....
Example 2
- Search TTCATTGACAACATTGCGT in
- mouse genome mm10,
- allowing 2 mismatches/gaps,
- search for + strand only,
- in tab-delimited txt format.
julia> res = gggsearch("TTCATTGACAACATTGCGT", db="mm10", k=2, strand="+", format="txt", output="toString"); julia> print(res) # [ GGGenome | 2018-07-01 22:59:01 ] # database: Mouse genome, GRCm38/mm10 (Dec, 2011) # query: TTCATTGACAACATTGCGT # count: 41 # name strand start end snippet snippet_pos snippet_end chr1 + 19461997 19462014 AGTTATTCAGCTTTCTATCACGATCAGAGAACAAGCTGAGAAAAGGATGTTTTTGCTTTTGCTTTTGTTTTTCTTCTTATTTTGGAGTTCTCATCCATGATTCATTGACACCATTGCTTTGGCCTCTGGGAAGGGCAGCATATCTGGGTAAAAGCAGATAGCAGAGCAAATCTGCTTACTGCAACCAGCCAGGAAGGAAGCAATGAAAGCACGTTCAC 19461897 19462114 chr1 + 98281503 98281520 TCTAGTGAGGAGAAATGTAAGCTAACGTGATAAACATTGTTTCTGATACACTAATTAAACTGACTTTTGAAAAGATGGCTTACATGTCTATCTAACATGTTTCATTGACACCATTGCTATAGTATGTAATTTTAATGTAAAATAGCCTTCTTTGCAGGGAATCCAGCCTGCTGCTGAATCTTTAAATTTTCAGTGTCTGTTGTCATAGTAACCAGAAT 98281403 98281620 ...
Understanding output
parameters
By default, gggsearch()
returns a HTTP.Messages.Response object.
julia> query = "GTGCGGTAACGCGACCGATCCCGGAGAAGCCGGCGGGA"; julia> res = gggsearch(query, db="refseq", format="txt"); julia> typeof(res) HTTP.Messages.Response
By setting output="toString"
, gggsearch()
returns a String object.
julia> query = "GTGCGGTAACGCGACCGATCCCGGAGAAGCCGGCGGGA"; julia> res = gggsearch(query, db="refseq", format="txt", output="toString"); julia> typeof(res) String julia> println(res) # [ GGGenome | 2018-07-01 22:25:16 ] # database: RefSeq complete RNA release 88 (May, 2018) # query: GTGCGGTAACGCGACCGATCCCGGAGAAGCCGGCGGGA # count: 15 # query: TCCCGCCGGCTTCTCCGGGATCGGTCGCGTTACCGCAC # count: 10 # name strand start end snippet snippet_pos snippet_end NR_003279.1 Mus musculus 28S ribosomal RNA (Rn28s1), ribosomal RNA + 2326 2363 GAAGGGACGGGCGATGGCCTCCGTTGCCCTCGGCCGATCGAAAGGGAGTCGGGTTCAGATCCCCGAATCCGGAGTGGCGGAGATGGGCGCCGCGAGGCCAGTGCGGTAACGCGACCGATCCCGGAGAAGCCGGCGGGAGGCCTCGGGGAGAGTTCTCTTTTCTTTGTGAAGGGCAGGGCGCCCTGGAATGGGTTCGCCCCGAGAGAGGGGCCCGTGCCTTGGAAAGCGTCGCGGTTCC 2226 2463 NR_003287.4 Homo sapiens RNA, 28S ribosomal N5 (RNA28SN5), ribosomal RNA + 2574 2611 GGGACGGGCGATGGCCTCCGTTGCCCTCGGCCGATCGAAAGGGAGTCGGGTTCAGATCCCCGAATCCGGAGTGGCGGAGATGGGCGCCGCGAGGCGTCCAGTGCGGTAACGCGACCGATCCCGGAGAAGCCGGCGGGAGCCCCGGGGAGAGTTCTCTTTTCTTTGTGAAGGGCAGGGCGCCCTGGAATGGGTTCGCCCCGAGAGAGGGGCCCGTGCCTTGGAAAGCGTCGCGGTTCCG 2474 2711 ...
By setting output="extractTopHit"
, gggsearch()
returns a String object containing the top hit (Currently, only works with format="txt"
).
julia> query = "GTGCGGTAACGCGACCGATCCCGGAGAAGCCGGCGGGA"; julia> res = gggsearch(query, db="refseq", format="txt", output="extractTopHit"); julia> typeof(res) String julia> println(res) NR_003279.1 Mus musculus 28S ribosomal RNA (Rn28s1), ribosomal RNA + 2326 2363 GAAGGGACGGGCGATGGCCTCCGTTGCCCTCGGCCGATCGAAAGGGAGTCGGGTTCAGATCCCCGAATCCGGAGTGGCGGAGATGGGCGCCGCGAGGCCAGTGCGGTAACGCGACCGATCCCGGAGAAGCCGGCGGGAGGCCTCGGGGAGAGTTCTCTTTTCTTTGTGAAGGGCAGGGCGCCCTGGAATGGGTTCGCCCCGAGAGAGGGGCCCGTGCCTTGGAAAGCGTCGCGGTTCC 2226 2463