简体   繁体   中英

Curl works but not Python requests

I am trying to fetch a JSON response from http://erdos.sdslabs.co/users/shagun.json . Using browser/Python's Requests library leads to an authentication error, but curl seems to work fine.

curl http://erdos.sdslabs.co/users/shagun.json 

returns the JSON response.

Why would the curl request work while a normal browser or Requests-based request fail?

Using telnet to check:

$ telnet erdos.sdslabs.co 80
Trying 62.141.37.215...
Connected to erdos.sdslabs.co.
Escape character is '^]'.
GET http://erdos.sdslabs.co/users/shagun.json HTTP/1.0

HTTP/1.1 302 Found
Date: Sat, 26 Jul 2014 11:18:58 GMT
Server: Apache
Set-Cookie: PHPSESSID=juvg7vrg3vs4t00om3a95m4sc7; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /login
Access-Control-Allow-Origin: http://erdos.sdslabs.co
X-Powered-By: PleskLin
Content-Length: 1449
Connection: close
Content-Type: application/json

{"email":"sshagun.sodhani@gmail.com","username":"shagun","name":"Shagun      
[...]

We see that the web server is responding with a 302 - a redirection to Location /login. Requests and web browsers are obeying that, and reaching the login prompt. However, we see that the web server is also responding with the json you're after, and curl (and telnet) are simple enough to just accept that data.

Best practice would be to fix the web server so that it either doesn't require you to log in, or doesn't give out password-protected data at the same time as asking users to log in.

If you can't change the web server, you could tell the requests module to ignore redirects:

import requests
result = requests.get('http://erdos.sdslabs.co/users/shagun.json', allow_redirects=False)
print result.content

In case you have a proxy configured at your environment, define it at your session/request as well.

For example with session:

    my_proxies = {  
        'http': 'http://myproxy:8080',  
        'https': 'https://myproxy:8080'  
    }

    session = requests.Session()  
    request = requests.Request('POST', 'http://my.domain.com', data=params_template, headers=req_headers, proxies=my_proxies)  
    prepped = session.prepare_request(request)  
    response = session.send(prepped)  

see documentation:
request http://docs.python-requests.org/en/master/user/quickstart/
session http://docs.python-requests.org/en/master/user/advanced/

For late googlers like myself:

In my case, the problem was that I provided url params using requests.get(url, data={...}) . After changing it to requests.get(url, params={...}) , the problem was solved.

I had the experience that some python requests code that had worked previously one day didn't come back the next, while curl was still working. It wasn't the code, and it wasn't the server, and reading this discussion it dawned on me that something in the connection may have changed. I disabled and re-enabled my Wifi, and lo and behold, it worked again.

I didn't investigate further, requests may have cached something that wasn't valid any more. Sorry about this unqualified input, but maybe it will help someone out there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM