简体   繁体   中英

HTTP request getting partial response

I'm trying to get this CrunchBase API page as a string in PHP. When I visit that page in a browser, I get the full response (some 230K characters); however, when I try to get the page in a script, the response is much shorter (24341 characters on a server and 36629 characters locally, with exactly the same number of characters for other long CrunchBase pages). To get the page, I am using a function almost identical to drupal_http_request() although I'm not using Drupal. (I have also tried using cURL and file_get_contents() and got the same result. And now that I'm thinking about it I have experienced the same from CrunchBase in Python in the past.)

What could be causing this and how can I fix it? PHP 5.3.2, Apache 2.2.14, Ubuntu 10.04. Here are additional details on the response:

[protocol] => HTTP/1.1
[headers] => Array
    (
        [content-type] => text/javascript; charset=utf-8
        [connection] => close
        [status] => 200 OK
        [x-powered-by] =>
        [etag] => "d809fc56a529054e613cd13e48d75931"
        [x-runtime] => 0.00453
        [content-length] => 230310
        [cache-control] => private, max-age=0, must-revalidate
        [server] => nginx/1.0.10 + Phusion Passenger 3.0.11 (mod_rails/mod_rack)
    )

I don't think it's a user agent issue as I used User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6 in the request.

UPDATE

According to this thread I needed to add the Accept-Encoding: gzip, deflate header to the request. That does result in a longer request result, but now I have to figure out how to inflate it. The gzinflate() function fails with a Warning: Data error . Any thoughts on how to inflate the response?

See the comments in the PHP docs about gzinflate() , specifically the remarks about stripping the initial bytes. The last comment did the trick for me:

<?php $dec = gzinflate(substr($enc,10)); ?>

Though it seems that the number of bytes to be stripped depends on the original encoder. Another comment has a more thorough solution, and a reference to RFC1952 for further reading.

Evidently gzdecode() is meant to address to this issue, but it hasn't been released yet.

ps -- I deleted my comment about the returned data being plain text. I was wrong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM