简体   繁体   中英

XML R How to retrieve values (could this be a namespace issue?)

Just when I thought I understood XPath! I must be missing something really simple, but I can't select the value of the node "citedby-count" in the following:

xml <- "<?xml version='1.0' encoding='UTF-8'?>
        <search-results xmlns='http://www.w3.org/2005/Atom' xmlns:cto='http://www.elsevier.com/xml/cto/dtd' xmlns:atom='http://www.w3.org/2005/Atom' xmlns:prism='http://prismstandard.org/namespaces/basic/2.0/' xmlns:opensearch='http://a9.com/-/spec/opensearch/1.1/' xmlns:dc='http://purl.org/dc/elements/1.1/'>

            <entry>
                 <prism:url>http://api.elsevier.com/content/abstract/scopus_id/111111</prism:url>
                 <dc:title>Paper Title</dc:title>
                 <citedby-count>1</citedby-count>
            </entry> 
        </search-results>"

doc <- xmlParse(xml)

I've tried

doc["//citedby-count"]

and

doc["//{'citedby-count'}"]

and

doc["//entry"]

but all return

list()
attr(,"class")
[1] "XMLNodeSet"

however,

doc["//dc:title"] 

works just fine.

Have I just been looking at this too long? Please help!

**Edit:**I thought this was because of the hyphen but it can't be because

doc["//entry"] 

doesn't work either.

Common namespace prefix is declared as xmlns:foo="..." , where foo is the prefix, and it is used in element name explicitly as <foo:bar> where bar is the element's local-name. Apart from that there is default namespace . It is namespace declared without prefix like xmlns="..." , and the usage is implied on the element where default prefix is declared as well as the descendant elements, unless something is overriding the default namespace inheritance ie having local default namespace or using explicit prefix in the descendant element's name.

That's the first part the story, which is about namespace in XML. On the other hand, XPath has no idea about default namespace. In XPath, element without prefix is always considered in empty namespace. To bridge the difference between XML and XPath regarding default namespace, usually when you need to query element in default namespace, you have to define a prefix pointing to the XML's default namespace and use that prefix in the XPath expression. That's basically what @hrbrmstr suggested in the first comment, something like the following (the prefix can be anything as long as it is mapped to the correct default namespace) :

doc["//d:citedby-count", namespaces=c(d="http://www.w3.org/2005/Atom")]

but turns out that your XML has an explicit prefix, atom , which already points to the same namespace uri and can be used directly.

您也可以使用doc["//x:citedby-count", namespace = "x"]来处理默认的名称空间(来自xpathApply的示例)。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM