简体   繁体   中英

PHP scraping return 403 Forbidden

i am trying scraping this web, but when i echo img script. is return 403 Forbidden-nginx/1.4.3

anyone can help ?

this my code :

$url = '1cak.com/trending-0-&ajax_seek=1396912798&seek_max_time=1396921201';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch,CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17)');
curl_setopt($ch, CURLOPT_FRESH_CONNECT, TRUE);
$curl_scraped_page = curl_exec($ch);

$html = new simple_html_dom();
$html->load($curl_scraped_page);


foreach($html->find('div[style="border-bottom:1px solid #ccc;padding-bottom:10px;padding-top:10px"]') as $item){
    echo $item->find('img',0)->src ."<br/>";
    echo "<img src=".$item->find('img',0)->src."><br/>";
}

Error 403 can mean few things:

  1. Your IP has been blocked because you have tried too many times to scrape the data, and there is nothing you can do about it (apart from using some sort of proxy, but that is talk for another question). You can test this by trying the same page from the server in the web browser (chrome/chromium or lynx if you only have access to ssh).

  2. Page has some sort of control of who is visiting, either by user agent or referrer or something similar. Since you are already trying to emulate browser, I dont think this is the issue here.

I have run into the "forbidden" error many times when echo-ing a large amount of data. I tend to put in a lot of "diagnostic" echoes when I am developing a complex PHP script.

The only fix I have found is to remove as many diagnostic echo statements as you can. I have not established what the echo limit is but I suspect that it would be different for every web host.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM