简体   繁体   中英

PHP Crawler not crawling all elements

so i'm trying to make a PHP crawler (for personal use). What the code does is displaying "found" for each ebay auction item found that ends in less than 1 hour but there seems to be a problem. The crawler can't get all the span elements and the "remaining time" element is a .

the simple_html_dom.php is downloaded and not edited.

 <?php include_once('simple_html_dom.php');

//url which i want to crawl -contains GET DATA-

    $url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';

    $html = new simple_html_dom();
    $html->load_file($url);
    foreach($html->find('span') as $part){
        echo $part;
//when i echo $part it does display many span elements but not the remaining time ones
        $cur_class = $part->class;

//the class attribute of an auction item that ends in less than an hour is equal with "MINUTES timeMs alert60Red"
        if($cur_class == 'MINUTES timeMs alert60Red'){
            echo 'found';
        }
    }
    ?>

Any answers would be useful, thanks in advance

Looking at the fetched HTML it seems as if the class alert60Red is set through JavaScript. So you couldn't find it as JavaScript is never executed.

So just searching for MINUTES timeMs looks stable as well.

<?php
    include_once('simple_html_dom.php');

    $url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';

    $html = new simple_html_dom();
    $html->load_file($url);
    foreach ($html->find('span') as $part) {
        $cur_class = $part->class;

        if (strpos($cur_class, 'MINUTES timeMs') !== false) {
            echo 'found';
        }
    }

If a snippet of code is included in another php file, or html is embedded in php, your browser cannot see it.

So no webcrawl api can detect it. I think your best bet is to find the location of simple_html_Dom.php and try crawl that file somehow. You may not even be able to get access to it. It's tricky.

You could also try find by Id if your api has that function?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM