简体   繁体   中英

PHP DOMXPath Can't get content related to each other parsed from generic html classes

I'm parsing an html page with php DOMXPath and I'm trying to get the nodeValue from class label corresponding to the nodeValue in class info.

<h3>
    <div class="metadata">
        <span class="label">Another Label</span>
        <span class="info">
            <a href="some-link.com">Link Name</a>
        </span>
    </div>
</h3>
<h3>
    <div class="metadata">
        <span class="label">Some Label</span>
        <span class="info">
            <a href="some-link.com">Link Name</a>, 
            <a href="another-link.com">Another Link Name</a>, 
            <a href="yet-another-link.com">Yet Another Link Name</a>
        </span>
    </div>
</h3>

I'm accessing the content with:

$label = $xpathLabel->query("//h3/div/span[@class='label']");
$info = $xpathInfo->query("//h3/div/span[@class='info']/a");

and outputting it with:

foreach ($labels as $label) {
    print "{$label->nodeValue}\n";
    foreach($infos as $info){
        print "\t{$info->nodeValue}\n";
    }
}

Which outputs:

Another Label
    Link Name
    Link Name
    Another Link Name
    Yet Another Link Name
Some Label
   Link Name
   Link Name
   Another Link Name
   Yet Another Link Name

It still makes sense why this is happening as the queries are independent and their output is all content from class label in one and all content of class info in the other.

Is there a better way to make the query or any better way to output the content that would solve the issue?

You need to use the outer metadata divs as the anchor for your loop, then list out the labels and info links within just that element:

$metadata = $xpathLabel->query("//h3/div[@class='metadata']");

foreach ($metadata as $group) {
    $labels = $xpathLabel->query("./span[@class='label']", $group);

    foreach ($labels as $label) {
        print "{$label->nodeValue}\n";
    }

    $infos = $xpathLabel->query("./span[@class='info']/a", $group);

    foreach($infos as $info){
        print "\t{$info->nodeValue}\n";
    }
}

The <div> elements are used as the $contextnode argument to DOMXpath::query , to only search the children of the current element.

See https://eval.in/955491 for a full example

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM