简体   繁体   English

在DomDocument中,重用DOMXpath,它是否稳定?

[英]In DomDocument, reuse of DOMXpath, it is stable?

I am using the function below, but not sure about it is always stable/secure... Is it? 我正在使用下面的功能,但不确定它总是稳定/安全...... 是吗?

When and who is stable/secure to "reuse parts of the DOMXpath preparing procedures"? 什么时候和谁稳定/安全“重用DOMXpath准备程序的部分”?

To simlify the use of the XPath query() method we can adopt a function that memorizes the last calls with static variables, 为了简化XPath query()方法的使用,我们可以采用一个函数来记忆最后一次使用静态变量的调用,

   function DOMXpath_reuser($file) {
      static $doc=NULL;
      static $docName='';
      static $xp=NULL;
      if (!$doc)
                $doc = new DOMDocument();
      if ($file!=$docName) {
                $doc->loadHTMLFile($file);
                $xp = NULL;
      }
      if (!$xp) 
                $xp = new DOMXpath($doc);
      return $xp;  // ??RETURNED VALUES ARE ALWAYS STABLE??
   }

The present question is similar to this other one about XSLTProcessor reuse. 本问题类似于关于XSLTProcessor重用的另一个问题。 In both questions the problem can be generalized for any language or framework that use LibXML2 as DomDocument implementation. 在这两个问题中,对于使用LibXML2作为DomDocument实现的任何语言或框架,可以推广该问题。

There are another related question: How to "refresh" DOMDocument instances of LibXML2? 还有另一个相关的问题: 如何“刷新”LibXML2的DOMDocument实例?


Illustrating 说明

The reuse is very commom (examples): 重用非常普遍(例子):

   $f = "my_XML_file.xml";
   $elements = DOMXpath_reuser($f)->query("//*[@id]");
   // use elements to get information
   $elements = DOMXpath_reuser($f)->("/html/body/div[1]");
   // use elements to get information

But, if you do something like removeChild , replaceChild , etc. (example), 但是,如果您执行removeChildreplaceChild等操作(例如),

   $div = DOMXpath_reuser($f)->query("/html/body/div[1]")->item(0);  //STABLE
   $div->parentNode->removeChild($div);                // CHANGES DOM
   $elements = DOMXpath_reuser($f)->query("//div[@id]"); // INSTABLE! !!

extrange things can be occur , and the queries not works as expected!! 可以发生外部事件 ,并且查询无法正常工作!!

  • When (what DOMDocument methods affect XPath?) 什么时候 (DOMDocument方法会影响XPath?)
  • Why we can not use something like normalizeDocument to "refresh DOM" (exist?)? 为什么我们不能使用像normalizeDocument这样的东西来“刷新DOM”(存在?)?
  • Only a "new DOMXpath($doc);" 只有“新的DOMXpath($ doc);” is allways secure? 总是安全吗? need to reload $doc also? 还需要重新加载$ ​​doc吗?

DOMXpath is affected by the load*() methods on DOMDocument. DOMXpath受DOMDocument上的load *()方法的影响。 After loading a new xml or html, you need to recreate the DOMXpath instance: 加载新的xml或html后,需要重新创建DOMXpath实例:

$xml = '<xml/>';    
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);

var_dump($xpath->document === $dom); // bool(true)

$dom->loadXml($xml);

var_dump($xpath->document === $dom); // bool(false)

In DOMXpath_reuser() you store a static variable and recreate the xpath depending on the file name. 在DOMXpath_reuser()中,存储静态变量并根据文件名重新创建xpath。 If you want to reuse an Xpath object, suggest extending DOMDocument. 如果要重用Xpath对象,建议扩展DOMDocument。 This way you only need pass the $dom variable around. 这样你只需要传递$ dom变量。 It would work with a stored xml file as well with xml string or a document your are creating. 它可以使用存储的xml文件以及xml字符串或您正在创建的文档。

The following class extends DOMDocument with an method xpath() that always returns a valid DOMXpath instance for it. 以下类使用方法xpath()扩展DOMDocument,该方法始终为其返回有效的DOMXpath实例。 It stores and registers the namespaces, too: 它也存储和注册命名空间:

class MyDOMDocument
  extends DOMDocument {

  private $_xpath = NULL;
  private $_namespaces = array();

  public function xpath() {
    // if the xpath instance is missing or not attached to the document
    if (is_null($this->_xpath) || $this->_xpath->document != $this) {
      // create a new one
      $this->_xpath = new DOMXpath($this);
      // and register the namespaces for it
      foreach ($this->_namespaces as $prefix => $namespace) {
        $this->_xpath->registerNamespace($prefix, $namespace);
      }
    }
    return $this->_xpath;
  }

  public function registerNamespaces(array $namespaces) {
    $this->_namespaces = array_merge($this->_namespaces, $namespaces);
    if (isset($this->_xpath)) {
      foreach ($namespaces as $prefix => $namespace) {
        $this->_xpath->registerNamespace($prefix, $namespace);
      }
    }
  }
}

$xml = <<<'ATOM'
  <feed xmlns="http://www.w3.org/2005/Atom">
    <title>Test</title>
  </feed>
ATOM;


$dom = new MyDOMDocument();
$dom->registerNamespaces(
  array(
    'atom' => 'http://www.w3.org/2005/Atom'
  )
);
$dom->loadXml($xml);
// created, first access
var_dump($dom->xpath()->evaluate('string(/atom:feed/atom:title)', NULL, FALSE));
$dom->loadXml($xml);
// recreated, connection was lost
var_dump($dom->xpath()->evaluate('string(/atom:feed/atom:title)', NULL, FALSE));

The DOMXpath class (instead of XSLTProcessor in your another question ) use reference to given DOMDocument object in contructor. DOMXpath类(而不是另一个问题中的XSLTProcessor)在构造函数中使用对给定DOMDocument对象的引用。 DOMXpath create libxml context object based on given DOMDocument and save it to internal class data. DOMXpath基于给定的DOMDocument创建libxml上下文对象,并将其保存到内部类数据。 Besides libxml context it s saves references to original DOMDocument` given in contructor arguments. 除了libxml上下文之外,它还s saves references to original contructor参数中给出的s saves references to original DOMDocument`的s saves references to original

What that means: 那意味着什么:

Part of sample from ThomasWeinert answer: 部分样本来自ThomasWeinert回答:

var_dump($xpath->document === $dom); // bool(true)  
$dom->loadXml($xml);    
var_dump($xpath->document === $dom); // bool(false)

gives false after load becouse of $dom already holds pointer to new libxml data but DOMXpath holds libxml context for $dom before load and pointer to real document after load. 由于$dom已经保存了指向新libxml数据的指针,但DOMXpath在加载之前保存了$dom libxml上下文,并且在加载DOMXpath保存了指向真实文档的指针。

Now about query works 现在关于query工作

If it should return XPATH_NODESET (as in your case) its make a node copy - node by node iterating throw detected node set( \\ext\\dom\\xpath.c from 468 line). 如果它应该返回XPATH_NODESET (如你的情况那样), XPATH_NODESET做一个节点拷贝 - 逐个节点迭代抛出检测到的节点集(从468行开始的\\ext\\dom\\xpath.c )。 Copy but with original document node as parent . 复制但原始文档节点为父级 Its means that you can modify result but this gone away you XPath and DOMDocument connection. 它意味着您可以修改结果但这消失了您的XPath和DOMDocument连接。

XPath results provide a parentNode memeber that knows their origin: XPath结果提供了一个知道其来源的parentNode memeber:

  • for attribute values, parentNode returns the element that carries them. 对于属性值,parentNode返回携带它们的元素。 An example is //foo/@attribute, where the parent would be a foo Element. 一个例子是// foo / @ attribute,其中父元素是foo元素。
  • for the text() function (as in //text()), it returns the element that contains the text or tail that was returned. 对于text()函数(如在// text()中),它返回包含返回的文本或尾部的元素。
  • note that parentNode may not always return an element. 请注意,parentNode可能并不总是返回一个元素。 For example, the XPath functions string() and concat() will construct strings that do not have an origin. 例如,XPath函数string()和concat()将构造没有原点的字符串。 For them, parentNode will return None. 对于他们,parentNode将返回None。

So, 所以,

  1. There is no any reasons to cache XPath . 没有任何理由缓存XPath It do not anything besides xmlXPathNewContext (just allocate lightweight internal struct ). 它除了xmlXPathNewContext之外没有任何东西(只是分配轻量级内部结构 )。
  2. Each time your modify your DOMDocument (removeChild, replaceChild, etc.) your should recreate XPath . 每次修改DOMDocument (removeChild,replaceChild等)时,都应该重新创建XPath
  3. We can not use something like normalizeDocument to "refresh DOM" because of it change internal document structure and invalidate xmlXPathNewContext created in Xpath constructor. 我们不能使用像normalizeDocument这样的东西来“刷新DOM”,因为它改变了内部文档结构并使在Xpath构造函数中创建的xmlXPathNewContext无效。
  4. Only "new DOMXpath($doc);" 只有“新的DOMXpath($ doc);” is allways secure? 总是安全吗? Yes, if you do not change $doc between Xpath usage. 是的,如果你没有在Xpath使用之间更改$ doc。 Need to reload $doc also - no, because of it invalidated previously created xmlXPathNewContext . 还需要重新加载$ ​​doc - 否,因为它使以前创建的xmlXPathNewContext无效。

(this is not a real answer, but a consolidation of comments and answers posted here and related questions) (这不是一个真正的答案,而是在此处发布的评论和答案的合并及相关问题)


This new version of the question's DOMXpath_reuser function contains the @ThomasWeinert suggestion (for avoid DOM changes by external re- load ) and an option $enforceRefresh to workaround the problem of instability (as related question shows the programmer must detect when ). 问题的DOMXpath_reuser函数的这个新版本包含@ThomasWeinert建议(用于避免外部重新load DOM更改)和一个选项$enforceRefresh来解决不稳定性问题(因为相关问题显示程序员必须检测何时 )。

   function DOMXpath_reuser_v2($file, $enforceRefresh=0) {  //changed here
      static $doc=NULL;
      static $docName='';
      static $xp=NULL;
      if (!$doc)
                $doc = new DOMDocument();
      if ( $file!=$docName || ($xp && $doc !== $xp->document) ) { // changed here
                $doc->load($file);
                $xp = NULL;
      } elseif ($enforceRefresh==2) {  // add this new refresh mode
                $doc->loadXML($doc->saveXML());
                $xp = NULL;
      }
      if (!$xp || $enforceRefresh==1)  //changed here
                $xp = new DOMXpath($doc);
      return $xp;
   }

When must to use $enforceRefresh=1 ? 什么时候必须使用$ enforceRefresh = 1?

... perhaps an open problem, only little tips and clues... ...也许是一个开放的问题,只有一些提示和线索......

  • when DOM submited to setAttribute, removeChild, replaceChild, etc. 当DOM提交到setAttribute,removeChild,replaceChild等时
  • ...? ...? more cases? 更多病例?

When must to use $enforceRefresh=2 ? 什么时候必须使用$ enforceRefresh = 2?

... perhaps an open problem, only little tips and clues... ...也许是一个开放的问题,只有一些提示和线索......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM