How do I parse HTML using PHP DOMDocument?

Question

I have an HTML block here:

<div class="title">
    <a href="http://test.com/asus_rt-n53/p195257/">
        Asus RT-N53
    </a>
</div>
<table>
    <tbody>
        <tr>
            <td class="price-status">
                <div class="status">
                    <span class="available">Yes</span>
                </div>
                <div name="price" class="price">
                    <div class="uah">758<span> ua.</span></div>
                    <div class="usd">$&nbsp;62</div>
                </div>

How do I parse the link ( http://test.com/asus_rt-n53/p195257/ ), title ( Asus RT-N53 ) and price ( 758 )?

Curl code here:

$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$models = $xpath->query('//div[@class="title"]/a');
foreach ($models as $model) {
    echo $model->nodeValue;
    $prices = $xpath->query('//div[@class="uah"]');
    foreach ($prices as $price) {
        echo $price->nodeValue;
    }
}

Answer 1

One ugly solution is to cast the price result to keep only numbers:

echo (int) $price->nodeValue;

Or, you can query to find the span inside the div, and remove it from the price (inside the prices foreach):

$span = $xpath->query('//div[@class="uah"]/span')->item(0);
$price->removeChild($span);
echo $price->nodeValue;

Edit:

To retrieve the link, simply use getAttribute() and get the href one:

$model->getAttribute('href')

How do I parse HTML using PHP DOMDocument?

Question

1 answers

solution1
0 ACCPTED 2013-01-10 21:37:22

How do I parse HTML using PHP DOMDocument?

Question

1 answers

solution1 0 ACCPTED 2013-01-10 21:37:22

solution1
0 ACCPTED 2013-01-10 21:37:22