简体   繁体   English

PHP cURL请求未遵循重定向

[英]PHP cURL Request Not Following Redirects

We have a crawler built in PHP to pull vital information from our clients' pages. 我们有一个内置的PHP搜寻器,可以从客户页面中提取重要信息。 The issue is that most of our clients post custom shortened links that use a 302 to go to the final destination. 问题是大多数客户发布的自定义缩短链接使用302到达最终目的地。 Our crawler has been successful in following these (see the code below) up until this latest client. 我们的搜寻器已经成功地遵循了这些规则(请参见下面的代码),直到此最新客户端为止。 Here's a sample link: 这是一个示例链接:

http://www.dose.com/lists/26235/s http://www.dose.com/lists/26235/s

If you go there in a browser, you'll see the standard 302 behavior, but if you visit it with a crawler, it simply returns a 200 and doesn't redirect. 如果在浏览器中访问该目录,则会看到标准的302行为,但是如果您使用搜寻器访问它,它只会返回200,并且不会重定向。 This led me to believe that I had to make the request look as "natural" as possible, but I still haven't had any success. 这使我相信我必须使请求看起来尽可能“自然”,但是我仍然没有成功。 Finally, here's the cURL section of our code: 最后,这是我们代码的cURL部分:

function sendRequest($url)
{
    global $ch;
    $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                " Gecko/20041107 Firefox/1.0";
    curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language: en-US,en;q=0.8',
        'Connection: keep-alive'
    ));
    curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_ENCODING, '');

    $contents = curl_exec($ch);
    //curl_close($ch);

    return $contents;
}

Edited to include the advice from below, although the issue still persists. 经过编辑以包括以下建议,尽管问题仍然存在。

If you aren't already, you'll need to manually inflate the response of that function using gzdecode() 如果还没有,则需要使用gzdecode()手动填充该函数的响应

An even better way might be to tell Curl to handle the compression itself, rather than manually specifying it. 更好的方法可能是告诉Curl处理压缩本身,而不是手动指定压缩。 Try removing the Accept-Encoding header line and adding: 尝试删除Accept-Encoding标头行并添加:

curl_setopt($ch, CURLOPT_ENCODING, '');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM