简体   繁体   中英

Php xpath query get spec character from href on tags

Tags

<a href="http://www.example.com/5809/book>Origin of Species</a>  
<a href="http://www.example.com/author/id=124>Darwin</a>  
<a href="http://www.example.com/196/genres>Science, Biology</a>  
<span class="Xbkznofv">24/11/1859</span>

How do i get id numbers using xpath query from href on tags ?

I want result like this example:

5809, 124, 196, 24/11/1859

Php Code

$url = 'http://www.example.com/Books/Default.aspx';
libxml_use_internal_errors(true); 
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);

$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]');  
$elements2 = $xpath->query('//a[contains(@href,  "www.example.com/author/id=")]');  
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]');  
$elements4 = $xpath->query('//span[contains(@class, "")]');

if (!is_null($elements)) {
  foreach ($elements as $element) {
echo "<br/>". "";

$nodes = $element->childNodes;
foreach ($nodes as $node) {
  echo $node->nodeValue. "\n";
    }
  }
}

Xpath 1.0 has some limited string manipulation, but at some point it will be far easier just to read the attribute and extract the values using Regular Expressions.

However here is an example using Xpath only:

$html = <<<'HTML'
<a href="http://www.example.com/5809/book">Origin of Species</a>  
<a href="http://www.example.com/author/id=124">Darwin</a>  
<a href="http://www.example.com/196/genres">Science, Biology</a>  
<span class="Xbkznofv">24/11/1859</span>
HTML;

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);

$data = [
  'book_title' => $xpath->evaluate(
    'string(//a[contains(@href,  "www.example.com") and contains(@href, "/book")])'
  ),
  'book_id' => $xpath->evaluate(
    'substring-before(
      substring-after(
        //a[contains(@href,  "www.example.com") and contains(@href, "/book")]/@href,
        "www.example.com/"
      ),
      "/"
    )'
  ),
  'author_id' => $xpath->evaluate(
    'substring-after(
      //a[contains(@href,  "www.example.com/author/id=")]/@href,
      "/id="
    )'
  )
];

var_dump($data);

Output:

array(3) {
  ["book_title"]=>
  string(17) "Origin of Species"
  ["book_id"]=>
  string(4) "5809"
  ["author_id"]=>
  string(3) "124"
}

These expression will only work with DOMXpath::evaluate() , DOMXpath::query() can only return node list.

Most of the time you will use one expression to fetch the a list of nodes, iterate them and use several expression to fetch the values. Here is a simplified example:

$html = <<<'HTML'
<div class="book">
  <a href="#1">Origin of Species</a>
</div>
<div class="book">
  <a href="#2">On the Shoulders of Giants</a>
</div>
HTML;

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);

foreach ($xpath->evaluate('//div[@class="book"]') as $book) {
  var_dump(
    $xpath->evaluate('string(.//a)', $book),
    $xpath->evaluate('string(.//a/@href)', $book)
  );
}

Output:

string(17) "Origin of Species"
string(2) "#1"
string(26) "On the Shoulders of Giants"
string(2) "#2"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM