简体   繁体   English

XPath / PHP-与正则表达式匹配的特定标签的返回索引

[英]XPath / PHP - Return index of specific tag that matches a regex

I'm trying to get the index of a tag which href matches certain regex, but whatever I try is throwing me a warning that says that the expression is invalid. 我正在尝试获取与某些正则表达式匹配的href索引的索引,但是我尝试执行的操作都会向我发出警告,指出该表达式无效。 Here's an example. 这是一个例子。

$dom = new DOMDocument();
$dom->loadHTML($html);
$url_check = testurl.com
$finder = new DomXPath($dom);

$finder->registerNamespace("php", "http://php.net/xpath");
$finder->registerPhpFunctions('preg_match');

//Updated to fix some errors, still invalid expression
$index = $finder->evaluate("count((/ol[@id='rso']/li[not(@id) and @class = 'g' and h3[@class='r']/a[php:function('preg_match','/^(http://|https://|ftp://)?(www(\d+)?.)?($url_check)\/?$/', string(@href) > 0)]])/preceding-sibling::*)");

$html is a string that stores the html of a webpage, which contains something like this $html是一个字符串,用于存储网页的html,其中包含类似这样的内容

<ol id="wrap">
  <li class="list">
    <h3 class="j">
      <a href="http://xxxxxx.com">Not the one I'm trying to match</a>    
    </h3>
  </li>
  .
  .
  .
  <li class="list">
    <h3 class="j">
      <a href="http://testurl.com">Click here</a>    
    </h3>
  </li>
</ol>

Any suggestion is appreciated, and if you know a better/faster way to do this feel free to share :) 任何建议都将受到赞赏,如果您知道更好/更快的方法,可以随时分享:)

I found at least three problems in your expression : 我在您的表情中发现至少三个问题:

  • preceding-siblings should be singular, not plural preceding-siblings应该是单数,而不是复数
  • the count() function has no ending parenthesis count()函数没有结尾括号
  • $url_check = testurl.com has no quotes (should trigger a syntax error). $url_check = testurl.com没有引号(应触发语法错误)。

fixed code : 固定代码:

$index = $finder->evaluate("count(/ol[@id='wrap']/li[@class = 'list']/h3[@class='j']/a[php:function('preg_match','/^(http://|https://|ftp://)?(www(\d+)?.)?($url_check)\/?$/', string(@href) > 0)]/preceding-sibling::li[@class='list'])");

Moreover, the example html code you give us doesn't provide any result for the expression (each <a> element has no siblings whatsoever). 而且,您提供给我们的示例html代码不会为表达式提供任何结果(每个<a>元素都没有兄弟姐妹)。 So, even with these fixes, the expression still returns 0 for your test case, which is normal 因此,即使进行了这些修复,表达式仍会为您的测试用例返回0,这是正常的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM