简体   繁体   中英

PHP Web Scraping

I have a code that will scrap the data from a website. The output is something like this:
Agriculture
Food
Apparel
How do I only output the first/nth category such as only (Agriculture)? I tried

echo $sub_title[1].'<br/>';

but doesn't seems to be working.

My code:

<?php
$ch = curl_init('http://www.alibaba.com/Products');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$html = curl_exec($ch);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$finder = new DOMXPath($dom);
$nodes = $finder->query('//h4[@class="sub-title"]');

foreach ($nodes as $node) {
    $sub_title = trim(explode("\n", trim($node->nodeValue))[0]);
    echo $sub_title.'<br/>';

}

?>

You can do it in many ways, one way is to just use the foreach key and add an if condition inside the loop:

// indices start at zero
$fifth = 4; // or 5 - 1

foreach ($nodes as $key => $node) {
    if($key == $fifth) {
        $sub_title = trim(explode("\n", trim($node->nodeValue))[0]);
        echo $sub_title.'<br/>';
    }
}

Or add another query to explicitly point it to that nth position:

$fifth = $finder->evaluate('
    string(
        (//h4[@class="sub-title"])[5]
    )
');
$fifth = explode("\n", trim($fifth));
echo $fifth[0];

Or put them inside a container (an array), then explicitly call them by index (as per comments below):

$sub_title = array();
foreach ($nodes as $key => $node) {
    $sub_title[] = trim(explode("\n", trim($node->nodeValue))[0]);
}

echo $sub_title[4]; // call fifth

Try this in your loop:

$sub_titles = explode("\n", $node->nodeValue);
$first_sub_title = trim($sub_titles[0]);
echo $first_sub_title.'<br/>';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM