简体   繁体   English

PHP爬网程序未爬网所有元素

[英]PHP Crawler not crawling all elements

so i'm trying to make a PHP crawler (for personal use). 所以我正在尝试制作一个PHP搜寻器(供个人使用)。 What the code does is displaying "found" for each ebay auction item found that ends in less than 1 hour but there seems to be a problem. 该代码的作用是为每个发现在不到1小时内结束的eBay拍卖项目显示“找到”,但似乎存在问题。 The crawler can't get all the span elements and the "remaining time" element is a . 搜寻器无法获取所有span元素,而“ remaining time”元素为。

the simple_html_dom.php is downloaded and not edited. simple_html_dom.php已下载且未编辑。

 <?php include_once('simple_html_dom.php');

//url which i want to crawl -contains GET DATA-

    $url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';

    $html = new simple_html_dom();
    $html->load_file($url);
    foreach($html->find('span') as $part){
        echo $part;
//when i echo $part it does display many span elements but not the remaining time ones
        $cur_class = $part->class;

//the class attribute of an auction item that ends in less than an hour is equal with "MINUTES timeMs alert60Red"
        if($cur_class == 'MINUTES timeMs alert60Red'){
            echo 'found';
        }
    }
    ?>

Any answers would be useful, thanks in advance 任何答案都会有用,在此先感谢

Looking at the fetched HTML it seems as if the class alert60Red is set through JavaScript. 看着获取的HTML,似乎是通过JavaScript设置了alert60Red类。 So you couldn't find it as JavaScript is never executed. 所以您找不到它,因为从未执行过JavaScript。

So just searching for MINUTES timeMs looks stable as well. 因此,仅搜索MINUTES timeMs看起来也很稳定。

<?php
    include_once('simple_html_dom.php');

    $url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';

    $html = new simple_html_dom();
    $html->load_file($url);
    foreach ($html->find('span') as $part) {
        $cur_class = $part->class;

        if (strpos($cur_class, 'MINUTES timeMs') !== false) {
            echo 'found';
        }
    }

If a snippet of code is included in another php file, or html is embedded in php, your browser cannot see it. 如果另一段php文件中包含一小段代码,或者php中嵌入了html,则您的浏览器将看不到它。

So no webcrawl api can detect it. 因此,没有任何Webcrawl api可以检测到它。 I think your best bet is to find the location of simple_html_Dom.php and try crawl that file somehow. 我认为您最好的选择是找到simple_html_Dom.php的位置,然后尝试以某种方式爬网该文件。 You may not even be able to get access to it. 您甚至可能无法访问它。 It's tricky. 这很棘手。

You could also try find by Id if your api has that function? 您还可以尝试通过ID查找,如果您的api具有该功能?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM