简体   繁体   English

php xpath查询从标签上的href获取规范字符

[英]Php xpath query get spec character from href on tags

Tags 标签

<a href="http://www.example.com/5809/book>Origin of Species</a>  
<a href="http://www.example.com/author/id=124>Darwin</a>  
<a href="http://www.example.com/196/genres>Science, Biology</a>  
<span class="Xbkznofv">24/11/1859</span>

How do i get id numbers using xpath query from href on tags ? 如何使用xpath查询标签上的href获得ID号?

I want result like this example: 我想要这样的结果:

5809, 124, 196, 24/11/1859 5809,124,196,24/11/1859

Php Code 邮递区号

$url = 'http://www.example.com/Books/Default.aspx';
libxml_use_internal_errors(true); 
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);

$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]');  
$elements2 = $xpath->query('//a[contains(@href,  "www.example.com/author/id=")]');  
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]');  
$elements4 = $xpath->query('//span[contains(@class, "")]');

if (!is_null($elements)) {
  foreach ($elements as $element) {
echo "<br/>". "";

$nodes = $element->childNodes;
foreach ($nodes as $node) {
  echo $node->nodeValue. "\n";
    }
  }
}

Xpath 1.0 has some limited string manipulation, but at some point it will be far easier just to read the attribute and extract the values using Regular Expressions. Xpath 1.0具有一些有限的字符串操作,但是在某些时候,仅读取属性并使用正则表达式提取值会容易得多。

However here is an example using Xpath only: 但是,这是仅使用Xpath的示例:

$html = <<<'HTML'
<a href="http://www.example.com/5809/book">Origin of Species</a>  
<a href="http://www.example.com/author/id=124">Darwin</a>  
<a href="http://www.example.com/196/genres">Science, Biology</a>  
<span class="Xbkznofv">24/11/1859</span>
HTML;

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);

$data = [
  'book_title' => $xpath->evaluate(
    'string(//a[contains(@href,  "www.example.com") and contains(@href, "/book")])'
  ),
  'book_id' => $xpath->evaluate(
    'substring-before(
      substring-after(
        //a[contains(@href,  "www.example.com") and contains(@href, "/book")]/@href,
        "www.example.com/"
      ),
      "/"
    )'
  ),
  'author_id' => $xpath->evaluate(
    'substring-after(
      //a[contains(@href,  "www.example.com/author/id=")]/@href,
      "/id="
    )'
  )
];

var_dump($data);

Output: 输出:

array(3) {
  ["book_title"]=>
  string(17) "Origin of Species"
  ["book_id"]=>
  string(4) "5809"
  ["author_id"]=>
  string(3) "124"
}

These expression will only work with DOMXpath::evaluate() , DOMXpath::query() can only return node list. 这些表达式仅适用于DOMXpath::evaluate()DOMXpath::query()只能返回节点列表。

Most of the time you will use one expression to fetch the a list of nodes, iterate them and use several expression to fetch the values. 大多数时候,您将使用一个表达式来获取节点列表,对其进行迭代,然后使用多个表达式来获取值。 Here is a simplified example: 这是一个简化的示例:

$html = <<<'HTML'
<div class="book">
  <a href="#1">Origin of Species</a>
</div>
<div class="book">
  <a href="#2">On the Shoulders of Giants</a>
</div>
HTML;

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);

foreach ($xpath->evaluate('//div[@class="book"]') as $book) {
  var_dump(
    $xpath->evaluate('string(.//a)', $book),
    $xpath->evaluate('string(.//a/@href)', $book)
  );
}

Output: 输出:

string(17) "Origin of Species"
string(2) "#1"
string(26) "On the Shoulders of Giants"
string(2) "#2"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM