php xpath查询从标签上的href获取规范字符

Question

标签

<a href="http://www.example.com/5809/book>Origin of Species</a>  
<a href="http://www.example.com/author/id=124>Darwin</a>  
<a href="http://www.example.com/196/genres>Science, Biology</a>  
<span class="Xbkznofv">24/11/1859</span>

如何使用xpath查询标签上的href获得ID号？

我想要这样的结果：

5809，124，196，24/11/1859

邮递区号

$url = 'http://www.example.com/Books/Default.aspx';
libxml_use_internal_errors(true); 
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);

$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]');  
$elements2 = $xpath->query('//a[contains(@href,  "www.example.com/author/id=")]');  
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]');  
$elements4 = $xpath->query('//span[contains(@class, "")]');

if (!is_null($elements)) {
  foreach ($elements as $element) {
echo "<br/>". "";

$nodes = $element->childNodes;
foreach ($nodes as $node) {
  echo $node->nodeValue. "\n";
    }
  }
}

Answer 1

Xpath 1.0具有一些有限的字符串操作，但是在某些时候，仅读取属性并使用正则表达式提取值会容易得多。

但是，这是仅使用Xpath的示例：

$html = <<<'HTML'
<a href="http://www.example.com/5809/book">Origin of Species</a>  
<a href="http://www.example.com/author/id=124">Darwin</a>  
<a href="http://www.example.com/196/genres">Science, Biology</a>  
<span class="Xbkznofv">24/11/1859</span>
HTML;

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);

$data = [
  'book_title' => $xpath->evaluate(
    'string(//a[contains(@href,  "www.example.com") and contains(@href, "/book")])'
  ),
  'book_id' => $xpath->evaluate(
    'substring-before(
      substring-after(
        //a[contains(@href,  "www.example.com") and contains(@href, "/book")]/@href,
        "www.example.com/"
      ),
      "/"
    )'
  ),
  'author_id' => $xpath->evaluate(
    'substring-after(
      //a[contains(@href,  "www.example.com/author/id=")]/@href,
      "/id="
    )'
  )
];

var_dump($data);

输出：

array(3) {
  ["book_title"]=>
  string(17) "Origin of Species"
  ["book_id"]=>
  string(4) "5809"
  ["author_id"]=>
  string(3) "124"
}

这些表达式仅适用于DOMXpath::evaluate() ， DOMXpath::query()只能返回节点列表。

大多数时候，您将使用一个表达式来获取节点列表，对其进行迭代，然后使用多个表达式来获取值。 这是一个简化的示例：

$html = <<<'HTML'
<div class="book">
  <a href="#1">Origin of Species</a>
</div>
<div class="book">
  <a href="#2">On the Shoulders of Giants</a>
</div>
HTML;

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);

foreach ($xpath->evaluate('//div[@class="book"]') as $book) {
  var_dump(
    $xpath->evaluate('string(.//a)', $book),
    $xpath->evaluate('string(.//a/@href)', $book)
  );
}

输出：

string(17) "Origin of Species"
string(2) "#1"
string(26) "On the Shoulders of Giants"
string(2) "#2"

php xpath查询从标签上的href获取规范字符

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-07-17 09:00:27

php xpath查询从标签上的href获取规范字符

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-07-17 09:00:27

解决方案1
0 已采纳 2017-07-17 09:00:27