E-Utilities
E-Utilities provide a interface to Entrez databases at NCBI. The APIs are defined in the BioServices.EUtils
module, which exports nine functions to access its databases:
Function | Description |
---|---|
einfo |
Retrieve a list of databases or statistics for a database. |
esearch |
Retrieve a list of UIDs matching a text query. |
epost |
Upload or append a list of UIDs to the Entrez History server. |
esummary |
Retrieve document summaries for a list of UIDs. |
efetch |
Retrieve formatted data records for a list of UIDs. |
elink |
Retrieve UIDs linked to an input set of UIDs. |
egquery |
Retrieve the number of available records in all databases by a text query. |
espell |
Retrieve spelling suggestions. |
ecitmatch |
Retrieve PubMed IDs that correspond to a set of input citation strings. |
"The Nine E-utilities in Brief" summarizes all of the server-side programs corresponding to each function.
In this package, queries for databases are controlled by keyword parameters. For example, some functions take db
parameter to specify the target database. Functions listed above take these parameters as keyword arguments and return a Response
object as follows:
julia> using BioServices.EUtils # import the nine functions above julia> res = einfo(db="pubmed") # retrieve statistics of the PubMed database Response(200 OK, 18 headers, 27360 bytes in body) julia> write("pubmed.xml", res.data) # save retrieved data into a file 27360 shell> head result.xml <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD einfo 20130322//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20130322/einfo.dtd"> <eInfoResult> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> <Description>PubMed bibliographic record</Description> <DbBuild>Build161024-2207m.1</DbBuild> <Count>26590895</Count> <LastUpdate>2016/10/25 02:06</LastUpdate>
Let's see a few more examples of parameters. The term
parameter specifies a search string (e.g. esearch(db="gene", term="tumor AND human[ORGN]")
). The id
parameter specifies a UID (or accession number) or a list of UIDs (e.g. efetch(db="protein", id="NP_000537.3", rettype="fasta")
, efetch(db="snp", id=["rs55863639", "rs587780067"])
). The complete list of parameters can be found at "The E-utilities In-Depth: Parameters, Syntax and More".
When a request succeeds the response object has a data
field containing formatted data, which can be saved to a file as demonstrated above. However, users are often interested in a part of the response data and may want to extract some fields in it. In such a case, EzXML.jl is helpful because it offers lots of tools to handle XML documents. The first thing you need to do is converting the response data into an XML document by parsexml
:
julia> res = efetch(db="nuccore", id="NM_001126.3", retmode="xml") Response(200 OK, 19 headers, 41536 bytes in body) julia> using EzXML julia> doc = parsexml(res.data) EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x00007fdd4cc43770>))
After that, you can query fields you want using XPath:
julia> seq = findfirst(doc, "/GBSet/GBSeq") EzXML.Node(<ELEMENT_NODE@0x00007fdd49f34b10>) julia> content(findfirst(seq, "GBSeq_definition")) "Homo sapiens adenylosuccinate synthase (ADSS), mRNA" julia> content(findfirst(seq, "GBSeq_accession-version")) "NM_001126.3" julia> length(find(seq, "//GBReference")) 10 julia> using Bio.Seq julia> DNASequence(content(findfirst(seq, "GBSeq_sequence"))) 2791nt DNA Sequence: ACGGGAGTGGCGCGCCAGGCCGCGGAAGGGGCGTGGCCT…TGATTAAAAGAACCAAATATTTCTAGTATGAAAAAAAAA
Every function can take a context dictionary as its first argument to set parameters into a query. Key-value pairs in a context are appended to a query in addition to other parameters passed by keyword arguments. The default context is an empty dictionary that sets no parameters. This context dictionary is especially useful when temporarily caching query UIDs into the Entrez History server. A request to the Entrez system can be associated with cached data using "WebEnv" and "query_key" parameters. In the following example, the search results of esearch
is saved in the Entrez History server (note usehistory=true
, which makes the server cache its search results) and then their summaries are retrieved in the next call of esummary
:
julia> context = Dict() # create an empty context Dict{Any,Any} with 0 entries julia> res = esearch(context, db="pubmed", term="asthma[mesh] AND leukotrienes[mesh] AND 2009[pdat]", usehistory=true) Response(200 OK, 18 headers, 1574 bytes in body) julia> context # the context dictionary has been updated Dict{Any,Any} with 2 entries: :query_key => "1" :WebEnv => "NCID_1_9251987_130.14.22.215_9001_1477389145_1960133… julia> res = esummary(context, db="pubmed") # retrieve summaries in context Response(200 OK, 18 headers, 135463 bytes in body) julia> write("asthma_leukotrienes_2009.xml", res.data) # save data into a file 135463 shell> head asthma_leukotrienes_2009.xml <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD esummary v1 20041029//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd"> <eSummaryResult> <DocSum> <Id>20113659</Id> <Item Name="PubDate" Type="Date">2009 Nov</Item> <Item Name="EPubDate" Type="Date"></Item> <Item Name="Source" Type="String">Zhongguo Dang Dai Er Ke Za Zhi</Item> <Item Name="AuthorList" Type="List"> <Item Name="Author" Type="String">He MJ</Item>