PHP Crawler not crawling all elements

Question

so i'm trying to make a PHP crawler (for personal use). What the code does is displaying "found" for each ebay auction item found that ends in less than 1 hour but there seems to be a problem. The crawler can't get all the span elements and the "remaining time" element is a .

the simple_html_dom.php is downloaded and not edited.

 <?php include_once('simple_html_dom.php');

//url which i want to crawl -contains GET DATA-

    $url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';

    $html = new simple_html_dom();
    $html->load_file($url);
    foreach($html->find('span') as $part){
        echo $part;
//when i echo $part it does display many span elements but not the remaining time ones
        $cur_class = $part->class;

//the class attribute of an auction item that ends in less than an hour is equal with "MINUTES timeMs alert60Red"
        if($cur_class == 'MINUTES timeMs alert60Red'){
            echo 'found';
        }
    }
    ?>

Any answers would be useful, thanks in advance

Answer 1

Looking at the fetched HTML it seems as if the class alert60Red is set through JavaScript. So you couldn't find it as JavaScript is never executed.

So just searching for MINUTES timeMs looks stable as well.

<?php
    include_once('simple_html_dom.php');

    $url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';

    $html = new simple_html_dom();
    $html->load_file($url);
    foreach ($html->find('span') as $part) {
        $cur_class = $part->class;

        if (strpos($cur_class, 'MINUTES timeMs') !== false) {
            echo 'found';
        }
    }

Answer 2

If a snippet of code is included in another php file, or html is embedded in php, your browser cannot see it.

So no webcrawl api can detect it. I think your best bet is to find the location of simple_html_Dom.php and try crawl that file somehow. You may not even be able to get access to it. It's tricky.

You could also try find by Id if your api has that function?

PHP Crawler not crawling all elements

Question

2 answers

solution1
0 ACCPTED 2016-07-24 13:34:37

solution2
0 2016-07-24 13:38:58

PHP Crawler not crawling all elements

Question

2 answers

solution1 0 ACCPTED 2016-07-24 13:34:37

solution2 0 2016-07-24 13:38:58

solution1
0 ACCPTED 2016-07-24 13:34:37

solution2
0 2016-07-24 13:38:58