简体   繁体   中英

Get H2 text and href values from inside all H2 tags on the page using xpath?

I know nothing, ZERO, about xpath or DOM.

In the end I need the href value and the content of the span from 12 H2 tags on the page. I have figured out how to get each item individually but getting them all in one shot isn't clicking, no matter how much I read. A little help?

<h2 class="make-it-pretty">
    <a class="more-pretty" href="some-file-somewhere">
        <span class="another-class">Product Name</span>
    </a>
</h2>

Here is what I use to get them individually.

    $doc = new DOMDocument();
    $doc->loadHTML($html);
    $xpath = new DOMXPath($doc);

    $htext = $xpath->query('//h2[contains(@class, "make-it-pretty")]')->item(0);
    echo $htext->textContent;

I would probably use $doc->loadHTMLFile instead, but:

<?php
$html = '<html lang="en"><head><meta charset="UTF-8" /><title>Title Here</title></head>
  <body>
    <h2 class="make-it-pretty"><a class="more-pretty" href="some-file-somewhere"><span class="another-class">Product Name</span></a></h2>
  </body></html>';
$doc = @new DOMDocument(); $doc->loadHTML($html);
function getElementsByClassName($className, $withinNode = null){
  global $doc;
  $d = $withinNode ?? $doc;
  $r = []; $a = $d->getElementsByTagName('*');
  foreach($a as $n){
    if($n->getAttribute('class') === $className)$r[] = $n;
  }
  return $r;
}
$anotherClass = getElementsByClassName('another-class');
// getElementsByClassName('make-it-pretty'); works as well, in this case
echo $anotherClass[0]->textContent;
?>

try this without Xpath

<?
$html ='<h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2><h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2><h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2>';
$dom = new DOMDocument("1.0", "utf-8");
if($dom->loadHTML($html, LIBXML_NOWARNING)){
    $h2s = $dom->getElementsByTagName('h2');
    foreach ($h2s as $h2) {
        $as = $h2->getElementsByTagName('a');
        echo '<pre>';
        //print_r($as);
        foreach($as as $a){
            print_r('link :'.$a->getAttribute('href')."\n");
            $spans = $a->getElementsByTagName('span');
        }
        
        foreach($spans as $span){
            print_r('content :'.$span->nodeValue."\n");
            }
        
        
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM