简体   繁体   中英

How do I parse HTML using PHP DOMDocument?

I have an HTML block here:

<div class="title">
    <a href="http://test.com/asus_rt-n53/p195257/">
        Asus RT-N53
    </a>
</div>
<table>
    <tbody>
        <tr>
            <td class="price-status">
                <div class="status">
                    <span class="available">Yes</span>
                </div>
                <div name="price" class="price">
                    <div class="uah">758<span> ua.</span></div>
                    <div class="usd">$&nbsp;62</div>
                </div>

How do I parse the link ( http://test.com/asus_rt-n53/p195257/ ), title ( Asus RT-N53 ) and price ( 758 )?

Curl code here:

$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$models = $xpath->query('//div[@class="title"]/a');
foreach ($models as $model) {
    echo $model->nodeValue;
    $prices = $xpath->query('//div[@class="uah"]');
    foreach ($prices as $price) {
        echo $price->nodeValue;
    }
}

One ugly solution is to cast the price result to keep only numbers:

echo (int) $price->nodeValue;

Or, you can query to find the span inside the div, and remove it from the price (inside the prices foreach):

$span = $xpath->query('//div[@class="uah"]/span')->item(0);
$price->removeChild($span);
echo $price->nodeValue;

Edit:

To retrieve the link, simply use getAttribute() and get the href one:

$model->getAttribute('href')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM