简体   繁体   English

使用默认命名空间绑定的XML上的XML xpath查询

[英]PHP xpath query on XML with default namespace binding

I have one solution to the subject problem, but it's a hack and I'm wondering if there's a better way to do this. 我有一个主题问题的解决方案,但它是一个黑客,我想知道是否有更好的方法来做到这一点。

Below is a sample XML file and a PHP CLI script that executes an xpath query given as an argument. 下面是一个示例XML文件和一个PHP CLI脚本,它执行作为参数给出的xpath查询。 For this test case, the command line is: 对于此测试用例,命令行是:

./xpeg "//MainType[@ID=123]"

What seems most strange is this line, without which my approach doesn't work: 最奇怪的是这条线,没有它我的方法不起作用:


As far as I know, this simply re-parses the modified XML, and it seems to me that this shouldn't be necessary. 据我所知,这只是重新解析修改后的XML,在我看来这不应该是必要的。

Is there a better way to perform xpath queries on this XML in PHP? 有没有更好的方法在PHP中对此XML执行xpath查询?

XML ( note the binding of the default namespace ): XML( 注意默认命名空间的绑定 ):

<?xml version="1.0" encoding="utf-8"?>
 xsi:schemaLocation="http://www.example.com/data http://www.example.com/data/MyRoot.xsd"
  <MainType ID="192" comment="Bob's site">
  <MainType ID="123" comment="Test site">
  <MainType ID="922" comment="Health Insurance">
  <MainType ID="389" comment="Used Cars">

PHP CLI Script: PHP CLI脚本:


$xml = file_get_contents("xpeg.xml");

$domdoc = new DOMDocument();

// remove the default namespace binding
$e = $domdoc->documentElement;

// hack hack, cough cough, hack hack

$xpath = new DOMXpath($domdoc);

$str = trim($argv[1]);
$result = $xpath->query($str);
if ($result !== FALSE) {
else {
  echo "error\n";

// The following function isn't really part of the
// question. It simply provides a concise summary of
// the result.
function dump_dom_levels($node, $level = 0) {
  $class = get_class($node);
  if ($class == "DOMNodeList") {
    echo "Level $level ($class): $node->length items\n";
    foreach ($node as $child_node) {
      dump_dom_levels($child_node, $level+1);
  else {
    $nChildren = 0;
    foreach ($node->childNodes as $child_node) {
      if ($child_node->hasChildNodes()) {
    if ($nChildren) {
      echo "Level $level ($class): $nChildren children\n";
    foreach ($node->childNodes as $child_node) {
      if ($child_node->hasChildNodes()) {
        dump_dom_levels($child_node, $level+1);

The solution is using the namespace, not getting rid of it. 解决方案是使用命名空间,而不是摆脱它。

$result = new DOMDocument();

$xpath = new DOMXpath($result);
$xpath->registerNamespace("x", trim($argv[2]));

$str = trim($argv[1]);
$result = $xpath->query($str);

And call it as this on the command line (note the x: in the XPath expression) 并在命令行上将其命名为(请注意XPath表达式中的x: :)

./xpeg "//x:MainType[@ID=123]" "http://www.example.com/data"

You can make this more shiny by 你可以让它更闪亮

  • finding out default namespaces yourself (by looking at the namespace property of the document element) 自己找出默认命名空间(通过查看文档元素的namespace属性)
  • supporting more than one namespace on the command line and register them all before $xpath->query() 在命令行上支持多个命名空间并在$xpath->query()之前注册它们
  • supporting arguments in the form of xyz=http//namespace.uri/ to create custom namespace prefixes xyz=http//namespace.uri/的形式支持参数以创建自定义名称空间前缀

Bottom line is: In XPath you can't query //foo when you really mean //namespace:foo . 底线是:在XPath中,当你真正的意思是//namespace:foo时,你无法查询//foo //namespace:foo These are fundamentally different and therefore select different nodes. 这些根本不同,因此选择不同的节点。 The fact that XML can have a default namespace defined (and thus can drop explicit namespace usage in the document) does not mean you can drop namespace usage in XPath. XML可以定义默认名称空间(因此可以删除文档中的显式名称空间使用)并不意味着您可以删除XPath中的名称空间使用。

Just out of curiosity, what happens if you remove this line? 出于好奇,如果你删除这条线会发生什么?


That strikes me as the most likely to cause the need for your hack. 这让我觉得最有可能导致你的黑客攻击。 You're basically removing the xmlns="http://www.example.com/data" part and then re-building the DOMDocument. 您基本上删除了xmlns="http://www.example.com/data"部分,然后重新构建DOMDocument。 Have you considered simply using string functions to remove that namespace? 您是否考虑过使用字符串函数删除该命名空间?

$pieces = explode('xmlns="', $xml);
$xml = $pieces[0] . substr($pieces[1], strpos($pieces[1], '"') + 1);

Then continue on your way? 然后继续前进? It might even end up being faster. 它甚至可能最终变得更快。


//*[local-name(.) = 'MainType'][@ID='123']

Given the current state of the XPath language, I feel that the best answer is provided by Tomalek: to associate a prefix with the default namespace and to prefix all tag names. 鉴于XPath语言的当前状态,我觉得Tomalek提供了最佳答案:将前缀与默认命名空间相关联,并为所有标记名称添加前缀。 That's the solution I intend to use in my current application. 这是我打算在我当前的应用程序中使用的解决方案。

When that's not possible or practical, a better solution than my hack is to invoke a method that does the same thing as re-scanning (hopefully more efficiently): DOMDocument::normalizeDocument() . 当这不可行或不实用时,比我的黑客更好的解决方案是调用一个与重新扫描(希望更有效)相同的方法: DOMDocument :: normalizeDocument() The method behaves “as if you saved and then loaded the document, putting the document in a 'normal' form.” 该方法表现为“就像您保存并加载文档一样,将文档置于'正常'形式。”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM