简体   繁体   中英

Complex xPath query with getNodeSet in R

I have the following xml file downloaded from Uniprot protein database.

protein <- xmlRoot(xmlTreeParse("http://www.uniprot.org/uniprot/Q01974.xml"))

From the numerous annotated features, I am interested in the start and end position of the kinase domain stored in the following xml node:

<feature type="domain" description="Protein kinase">
<location>
<begin position="288"/>
<end position="539"/>
</location>
</feature>

With the getNodeSet I could nicely locate this tag:

getNodeSet(protein, "//uniprot:feature[@type=\"domain\" and @description=\"Protein kinase\"]", c(uniprot="http://uniprot.org/uniprot"))

Unfortunately I could not narrow down the query, addition of any other criteria returns an empty list. Example:

getNodeSet(protein, "//uniprot:feature[@type=\"domain\" and @description=\"Protein kinase\"]/location", c(uniprot="http://uniprot.org/uniprot"))

Based on an online xpath tester, that should be valid xpath query, but returns empty:

list()
attr(,"class")
[1] "XMLNodeSet"

Could anyone help me please with this query? I am sure this is a normal behavior of getNodeSet, but I don't know what is the rational behind it. In general, what is the most appropriate way to phrase such relatively complicated queries in R? Should I store the result and then further narrow down?

Thank you very much!

use the same prefix for subsequent element as well :

//uniprot:feature[...]/uniprot:location

prefix + local-name identify each element. In case you have XML with default namespace (seems this is what you have), all element without prefix considered in default namespace. That's the reason why you need to use the prefix* for each element in XPath (not only the first element).

*) the prefix that points to default namespace URI

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM