简体   繁体   English

具有显式默认名称空间的XML文档的XPath和名称空间规范

[英]XPath and namespace specification for XML documents with an explicit default namespace

I'm struggling to get the correct combination of an XPath expression and the namespace specification as required by package XML (argument namespaces ) for a XML document that has an explicit xmlns namespace defined at the top element. 我挣扎得到XPath表达式和命名空间规范如由包所需的正确组合XML (参数namespaces 为具有明确的XML文档xmlns在顶部元件命名空间中定义。

UPDATE 更新

Thanks to har07 I was able to put it together: 多亏了har07,我得以将其整合在一起:

Once you query the namespaces, the first entry of ns has no name yet and that's the problem: 查询名称空间后, ns的第一个条目还没有名称,这就是问题所在:

nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))

> ns
                                             omegahat                          r 
    "http://something.org"  "http://www.omegahat.org" "http://www.r-project.org" 

So we'll just assign a name that serves as a prefix (this can be any valid R name): 因此,我们只分配一个用作前缀的名称(可以是任何有效的R名称):

names(ns)[1] <- "xmlns"

Now all we have to do is using that default namespace prefix everywhere in our XPath expressions: 现在,我们要做的就是在XPath表达式中的任何地方都使用默认的名称空间前缀:

getNodeSet(doc, "/xmlns:doc//xmlns:b[@omegahat:status='foo']", ns)

For those interested in alternative solutions based on name() and namespace-uri() (amongst others) might find this post helpful. 对于那些对基于name()namespace-uri()替代解决方案name()以及其他解决方案name()感兴趣的人,可能会发现这篇文章很有帮助。


Just for the sake of reference: this was the trial-and-error code before we came to the solution: 只是为了参考:这是我们尝试解决方案之前的反复试验代码:

Consider the example from ?xmlParse : 考虑来自?xmlParse的示例:

require("XML")

doc <- xmlParse(system.file("exampleData", "tagnames.xml", package = "XML"))

> doc
<?xml version="1.0"?>
<doc>
  <!-- A comment -->
  <a xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">
    <b>
      <c>
        <b/>
      </c>
    </b>
    <b omegahat:status="foo">
      <r:d>
        <a status="xyz"/>
        <a/>
        <a status="1"/>
      </r:d>
    </b>
  </a>
</doc>
nsDefs <- xmlNamespaceDefinitions(getNodeSet(doc, "/doc/a")[[1]])
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
getNodeSet(doc, "/doc//b[@omegahat:status='foo']", ns)[[1]]

In my document, however, the namespaces are already defined in <doc> tag, so I adapted the example XML code accordingly: 但是,在我的文档中,名称空间已经在<doc>标记中定义,因此我相应地修改了示例XML代码:

xml_source <- c(
  "<?xml version=\"1.0\"?>",
  "<doc xmlns:omegahat=\"http://www.omegahat.org\" xmlns:r=\"http://www.r-project.org\">",
  "<!-- A comment -->",
  "<a>",
  "<b>",
  "<c>",
  "<b/>",
  "</c>",
  "</b>",
  "<b omegahat:status=\"foo\">",
  "<r:d>",
  "<a status=\"xyz\"/>",
  "<a/>",
  "<a status=\"1\"/>",
  "</r:d>",
  "</b>",
  "</a>",
  "</doc>"
)
write(xml_source, file="exampleData_2.xml")  
doc <- xmlParse("exampleData_2.xml")
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))    
getNodeSet(doc, "/doc", namespaces = ns)
getNodeSet(doc, "/doc//b[@omegahat:status='foo']", namespaces = ns)[[1]]  

Everything still works fine. 一切仍然正常。 What's more, though, is that my XML code additionally has an explicit definition of the default namespace ( xmlns ): 但是,更重要的是,我的XML代码还具有默认名称空间( xmlns )的显式定义:

xml_source <- c(
  "<?xml version=\"1.0\"?>",
  "<doc xmlns=\"http://something.org\" xmlns:omegahat=\"http://www.omegahat.org\" xmlns:r=\"http://www.r-project.org\">",
  "<!-- A comment -->",
  "<a>",
  "<b>",
  "<c>",
  "<b/>",
  "</c>",
  "</b>",
  "<b omegahat:status=\"foo\">",
  "<r:d>",
  "<a status=\"xyz\"/>",
  "<a/>",
  "<a status=\"1\"/>",
  "</r:d>",
  "</b>",
  "</a>",
  "</doc>"  
)
write(xml_source, file="exampleData_3.xml")  
doc <- xmlParse("exampleData_3.xml")
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))

What used to work fails now: 过去工作失败了:

> getNodeSet(doc, "/doc", namespaces = ns)
list()
attr(,"class")
[1] "XMLNodeSet"
Warning message:
using http://something.org as prefix for default namespace http://something.org 

> getNodeSet(doc, "/xmlns:doc", namespaces = ns)
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression /xmlns:doc
In addition: Warning message:
using http://something.org as prefix for default namespace http://something.org 
getNodeSet(doc, "/xmlns:doc", 
  namespaces = matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs)
)

This seems to get me closer: 这似乎使我更接近:

> getNodeSet(doc, "/xmlns:doc",
+ namespaces = matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs)
+ )[[1]]
<doc xmlns="http://something.org" xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">
  <!-- A comment -->
  <a>
    <b>
      <c>
        <b/>
      </c>
    </b>
    <b omegahat:status="foo">
      <r:d>
        <a status="xyz"/>
        <a/>
        <a status="1"/>
      </r:d>
    </b>
  </a>
</doc> 

attr(,"class")
[1] "XMLNodeSet"

Yet, now I don't know how to proceed in order to get to the children nodes: 但是,现在我不知道如何进行操作以到达子节点:

> getNodeSet(doc, "/xmlns:doc//b[@omegahat:status='foo']", ns)[[1]]
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression /xmlns:doc//b[@omegahat:status='foo']
In addition: Warning message:
using http://something.org as prefix for default namespace http://something.org 

> getNodeSet(doc, "/xmlns:doc//b[@omegahat:status='foo']",
+ namespaces = c(
+ matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs),
+ matchNamespaces(doc, namespaces="omegahat", nsDefs = nsDefs)
+ )
+ )
list()
attr(,"class")
[1] "XMLNodeSet"

Namespace definition without prefix ( xmlns="..." ) is default namespace. 不带前缀( xmlns="..." )的命名空间定义是默认命名空间。 In case of XML document having default namespace, the element where default namespace declared and all of it's descendant without prefix and without different default namespace declaration are considered in that aforementioned default namespace. 在XML文档具有默认名称空间的情况下,在上述默认名称空间中考虑声明了默认名称空间的元素及其所有没有前缀且没有不同默认名称空间声明的后代。

Therefore, in your case you need to use prefix registered for default namespace at the beginning of all elements in the XPath, for example : 因此,在您的情况下,您需要在XPath中所有元素的开头使用为默认名称空间注册的前缀,例如:

/xmlns:doc//xmlns:b[@omegahat:status='foo']

UPDATE : 更新:

Actually I'm not a user of r , but looking at some references on net something like this may work : 实际上,我不是r的用户,但是在网上查看一些引用可能是可行的:

getNodeSet(doc, "/ns:doc//ns:b[@omegahat:status='foo']", c(ns="http://something.org"))

I think @HansHarhoff provides a very good solution. 我认为@HansHarhoff提供了很好的解决方案。

For anybody else still searching for a solution, in my experience I think the following works more generally since a single XML document can have multiple namespaces. 对于其他仍在寻找解决方案的人,以我的经验,我认为以下方法更通用,因为单个XML文档可以具有多个命名空间。

doc <- xmlInternalTreeParse(xml_source)

ns <- getDefaultNamespace(doc)[[1]]$uri
names(ns)[1] <- "xmlns"

getNodeSet(doc, "//xmlns:Parameter", namespaces = ns)

I had a similar issue, but in my case I did not care about the namespace and would like a solution that ignored the namespace. 我有一个类似的问题,但就我而言,我并不关心名称空间,而是想要一个忽略名称空间的解决方案。

Assume that we have the following XML in the variable myxml: 假设变量myxml中具有以下XML:

<root xmlns="uri:someuri.com:schema">
<Parameter>Test
</Parameter>
</root>

In R we want to read this so we run: 在R中,我们要阅读此内容,因此我们运行:

myxml <- '
<root xmlns="uri:someuri.com:schema">
  <Parameter>Test
</Parameter>
</root>
'
myxmltop <- xmlParse(myxml)
ns <- xmlNamespaceDefinitions(myxmltop, simplify =  TRUE)

Here I have simplified Rappster's code by using the simplify=TRUE parameter. 在这里,我通过使用simple = TRUE参数简化了Rappster的代码。 Now we can add the name/prefix of the namespace as in Rappster's code: 现在,我们可以像Rappster的代码一样添加名称空间的名称/前缀:

names(ns)[1] <- "xmlns"

Now we can refer to this namespace by: 现在,我们可以通过以下方式引用此命名空间:

getNodeSet(myxmltop, "//xmlns:Parameter", namespaces =ns)

Simpler solution (ignoring namespaces) 更简单的解决方案(忽略名称空间)

We can also be more flexible by matching on any namespace by doing: 通过执行以下操作可以匹配任何名称空间,从而使我们更加灵活:

myxmltop <- xmlParse(myxml)
getNodeSet(myxmltop, "//*[local-name() = 'Parameter']")

This solution was inspired by this SO answer . 此解决方案受此SO答案的启发。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM