繁体   English   中英

在PHP中抓取数据时遇到问题

[英]Trouble with scraping data in PHP

我正在从一个源代码为网站的网站上抓取数据

view-source:http://www.pakdukaan.com/75-computer-cases

我用来抓取数据的代码是这样的

<?php
$html = file_get_contents('http://www.pakdukaan.com/75-computer-cases'); 

$pk_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); 

if(!empty($html)){ 
$pk_doc->loadHTML($html);
libxml_clear_errors(); 
$pk_xpath = new DOMXPath($pk_doc);
$pk_list = array();
$pk_and_price = $pk_xpath->query('//div[@class="product_list list row "]');

if($pk_and_price->length > 0){  

foreach($pk_and_price as $pat){   
  $name = $pk_xpath->query('//h5[@class="name"]', $pat)->item(0)->nodeValue;
    $pkmn_types = array(); 
    $price = $pk_xpath->query('//span[@class="price product-price"]', $pat)

    foreach($types as $type){
        $pkmn_types[] = $type->nodeValue; 
    }
    $pk_list[] = array('name' => $name, 'price' => $pkmn_price);

}
}
}

//output what we have
echo "<pre>";
echo print_r($pk_list);
echo "</pre>";
?>

但是,我没有得到所有案件的名称,而是得到了一个,而另一件事是我获得了两次所有案件的价格。

这是输出

Array
(
[0] => Array
    (
        [name] => 

                Thermaltake V2 Plus + 350W Power Supply


        [price] => Array
            (
                [0] => 
                        Rs.  4,099                      
                [1] => 
                        Rs.  4,099                      
                [2] => 
                        Rs.  5,899                      
                [3] => 
                        Rs.  5,899                      
                [4] => 
                        Rs.  8,499                      
                [5] => 
                        Rs.  8,499                      
                [6] => 
                        Rs.  9,499                      
                [7] => 
                        Rs.  9,499                      
                [8] => 
                        Rs.  10,350                     
                [9] => 
                        Rs.  10,350                     
                [10] => 
                        Rs.  12,999                     
                [11] => 
                        Rs.  12,999                     
                [12] => 
                        Rs.  17,799                     
                [13] => 
                        Rs.  17,799                     
                [14] => 
                        Rs.  16,199                     
                [15] => 
                        Rs.  16,199                     
                [16] => 
                        Rs.  17,299                     
                [17] => 
                        Rs.  17,299                     
                [18] => 
                        Rs.  16,500                     
                [19] => 
                        Rs.  16,500                     
                [20] => 
                        Rs.  5,899                      
                [21] => 
                        Rs.  5,899                      
                [22] => 
                        Rs.  8,399                      
                [23] => 
                        Rs.  8,399                      
                [24] => 
                        Rs.  4,999                      
                [25] => 
                        Rs.  4,999                      
                [26] => 
                        Rs.  7,599                      
                [27] => 
                        Rs.  7,599                      
                [28] => 
                        Rs.  9,999                      
                [29] => 
                        Rs.  9,999                      
           )
    )
)
1

谁能解决这个问题? 我已经尝试了很多更改网站源代码中div的类,但无法获得适当的结果。

因此,让我们检查一下您的错误:

首先 :查询$pk_xpath->query('//h5[@class="name"]', $pat) ,然后仅获取item(0)

这意味着您将跳过 xpath-query中的所有其他DOMNodes 但是,如果您这样做:

$names = $pk_xpath->query('//h5[@class="name"]', $pat);
foreach ($names as $n) {
    echo $n->nodeValue . PHP_EOL;
}

您会在页面上看到所有名称。

第二 :价格。 如果您查看已抓取页面的html,您将看到span[@class="price product-price"]对每一项都是两倍 一个span可见,第二个span用于弹出框,当前处于隐藏状态。

因此,您需要另一个xpath查询,例如,您可以找到所有.product-meta项目,然后在其中搜索price product-price

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM