简体   繁体   中英

PHP cURL Request Not Following Redirects

We have a crawler built in PHP to pull vital information from our clients' pages. The issue is that most of our clients post custom shortened links that use a 302 to go to the final destination. Our crawler has been successful in following these (see the code below) up until this latest client. Here's a sample link:

http://www.dose.com/lists/26235/s

If you go there in a browser, you'll see the standard 302 behavior, but if you visit it with a crawler, it simply returns a 200 and doesn't redirect. This led me to believe that I had to make the request look as "natural" as possible, but I still haven't had any success. Finally, here's the cURL section of our code:

function sendRequest($url)
{
    global $ch;
    $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                " Gecko/20041107 Firefox/1.0";
    curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language: en-US,en;q=0.8',
        'Connection: keep-alive'
    ));
    curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_ENCODING, '');

    $contents = curl_exec($ch);
    //curl_close($ch);

    return $contents;
}

Edited to include the advice from below, although the issue still persists.

If you aren't already, you'll need to manually inflate the response of that function using gzdecode()

An even better way might be to tell Curl to handle the compression itself, rather than manually specifying it. Try removing the Accept-Encoding header line and adding:

curl_setopt($ch, CURLOPT_ENCODING, '');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM