简体   繁体   English

使用Curl抓取网站返回空白结果

[英]Scrape site using Curl returning blank results

What i'm trying to do is do a search on Amazon using a random keyword, then i'll just scrape maybe the first 10 results, the issue when i print the html results i get nothing, it's just blank, my code looks ok to me and i have used CURL in the past and never come accross this, my code: 我想做的是使用随机关键字在亚马逊上进行搜索,然后我将仅抓取前10个结果,当我print html结果时我什么都没得到,它只是空白,我的代码看起来还可以对我来说,我过去使用过CURL,但从来没有遇到过,我的代码是:

<?php

include_once("classes/simple_html_dom.php");

function get_random_keyword() {
    $f_contents = file("keywords.txt"); 
    return $f_contents[rand(0, count($f_contents) - 1)];    
}

function getHtml($page) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $page);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
    $html = curl_exec($ch);
    print "html -> " . $html;
    curl_close($ch);    
    return $html;
}


$html = getHtml("https://www.amazon.co.uk/s?k=" . get_random_keyword());

?>

Ideally i would have preferred to use the API, but from what i understand you need 3 sales first before you are granted access, can anyone see any issues? 理想情况下,我宁愿使用该API,但据我了解,您需要先获得3次销售,然后才能被授予访问权限,任何人都可以看到任何问题吗? i'm not sure what else to check, any help is appreciated. 我不确定还有什么要检查的,任何帮助表示赞赏。

Amazon is returning the response encoded in gzip. 亚马逊将返回以gzip编码的响应。 You need to decode it: 您需要对其进行解码:

$html = getHtml("https://www.amazon.co.uk/s?k=" . get_random_keyword());
echo gzdecode($html);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM