简体   繁体   中英

python requests http response 500 (site can be reached in browser)

I am trying to figure out what I'm doing wrong here, but I keep getting lost...

In python 2.7, I'm running following code:

>>> import requests
>>> req = requests.request('GET', 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu')
>>> req.content
'<html><body><h1>500 Server Error</h1>\nAn internal server error occured.\n</body></html>\n'

If I open this one in browser, it responds properly. I was digging around and found similar one with urllib library ( 500 error with urllib.request.urlopen ), however I am not able to adapt it, even more I would like to use requests here.

I might be hitting here some missing proxy setting, as suggested for example here ( Perl File::Fetch Failed HTTP response: 500 Internal Server Error ), but can someone explain me, what is the proper workaround with this one?

One thing that is different with the browser request is the User-Agent; however you can alter it using requests like this:

url = 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.90 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.status_code) #should be 200

Edit

Some web applications will also check the Origin and/or the Referer headers (for example for AJAX requests); you can set these in a similar fashion to User-Agent .

headers = {
    'Origin': 'http://example.com',
    'Referer': 'http://example.com/some_page'
}

Remember, you are setting these headers to basically bypass checks so please be a good netizen and don't abuse people's resources.

The User-Agent, and also other header elements, could be causing your problem.

When I came accross this error I watched a regular request made by a browser using Wireshark, and it turned out there were things other than just the User-Agent in the header which the server expected to be there.

After emulating the header sent by the browser in python requests, the server stopped throwing errors.

But Wait! There's More!

The above answers did help me on the path to resolution, but I had to find still more things to add to my headers so that certain sites would let me in using python requests. Learning how to use Wireshark (suggested above) was a good new skill for me, but I found an easier way.

If you go to your developer view (right-click then click Inspect in Chrome), then go to the Network tab, and then select one of the Names at left and then look under Headers for Requests Headers and expand, you'll get a complete list of what your system is sending to the server. I started adding elements that I thought were most likely needed one at a time and testing until my errors went away. Then I reduced that set to the smallest possible set that worked. In my case, with my headers having only User-Agent to deal with other code issues, I only needed to add the Accept-Language key to deal with a few other sites. See picture below as a guide to the text above.

I hope this process helps others to find ways to eliminate undesirable python requests return codes where possible.

Chrome 中我的开发人员/检查窗口的屏幕截图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM