[英]Trouble with scraping data in PHP
我正在从一个源代码为网站的网站上抓取数据
view-source:http://www.pakdukaan.com/75-computer-cases
我用来抓取数据的代码是这样的
<?php
$html = file_get_contents('http://www.pakdukaan.com/75-computer-cases');
$pk_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pk_doc->loadHTML($html);
libxml_clear_errors();
$pk_xpath = new DOMXPath($pk_doc);
$pk_list = array();
$pk_and_price = $pk_xpath->query('//div[@class="product_list list row "]');
if($pk_and_price->length > 0){
foreach($pk_and_price as $pat){
$name = $pk_xpath->query('//h5[@class="name"]', $pat)->item(0)->nodeValue;
$pkmn_types = array();
$price = $pk_xpath->query('//span[@class="price product-price"]', $pat)
foreach($types as $type){
$pkmn_types[] = $type->nodeValue;
}
$pk_list[] = array('name' => $name, 'price' => $pkmn_price);
}
}
}
//output what we have
echo "<pre>";
echo print_r($pk_list);
echo "</pre>";
?>
但是,我没有得到所有案件的名称,而是得到了一个,而另一件事是我获得了两次所有案件的价格。
这是输出
Array
(
[0] => Array
(
[name] =>
Thermaltake V2 Plus + 350W Power Supply
[price] => Array
(
[0] =>
Rs. 4,099
[1] =>
Rs. 4,099
[2] =>
Rs. 5,899
[3] =>
Rs. 5,899
[4] =>
Rs. 8,499
[5] =>
Rs. 8,499
[6] =>
Rs. 9,499
[7] =>
Rs. 9,499
[8] =>
Rs. 10,350
[9] =>
Rs. 10,350
[10] =>
Rs. 12,999
[11] =>
Rs. 12,999
[12] =>
Rs. 17,799
[13] =>
Rs. 17,799
[14] =>
Rs. 16,199
[15] =>
Rs. 16,199
[16] =>
Rs. 17,299
[17] =>
Rs. 17,299
[18] =>
Rs. 16,500
[19] =>
Rs. 16,500
[20] =>
Rs. 5,899
[21] =>
Rs. 5,899
[22] =>
Rs. 8,399
[23] =>
Rs. 8,399
[24] =>
Rs. 4,999
[25] =>
Rs. 4,999
[26] =>
Rs. 7,599
[27] =>
Rs. 7,599
[28] =>
Rs. 9,999
[29] =>
Rs. 9,999
)
)
)
1
谁能解决这个问题? 我已经尝试了很多更改网站源代码中div的类,但无法获得适当的结果。
因此,让我们检查一下您的错误:
首先 :查询$pk_xpath->query('//h5[@class="name"]', $pat)
,然后仅获取item(0)
。
这意味着您将跳过 xpath-query中的所有其他DOMNodes
。 但是,如果您这样做:
$names = $pk_xpath->query('//h5[@class="name"]', $pat);
foreach ($names as $n) {
echo $n->nodeValue . PHP_EOL;
}
您会在页面上看到所有名称。
第二 :价格。 如果您查看已抓取页面的html,您将看到span[@class="price product-price"]
对每一项都是两倍 。 一个span
可见,第二个span
用于弹出框,当前处于隐藏状态。
因此,您需要另一个xpath查询,例如,您可以找到所有.product-meta
项目,然后在其中搜索price product-price
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.