简体   繁体   English

python请求http响应500(可以在浏览器中访问站点)

[英]python requests http response 500 (site can be reached in browser)

I am trying to figure out what I'm doing wrong here, but I keep getting lost...我试图弄清楚我在这里做错了什么,但我一直迷路......

In python 2.7, I'm running following code:在 python 2.7 中,我正在运行以下代码:

>>> import requests
>>> req = requests.request('GET', 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu')
>>> req.content
'<html><body><h1>500 Server Error</h1>\nAn internal server error occured.\n</body></html>\n'

If I open this one in browser, it responds properly.如果我在浏览器中打开这个,它会正确响应。 I was digging around and found similar one with urllib library ( 500 error with urllib.request.urlopen ), however I am not able to adapt it, even more I would like to use requests here.我在四处挖掘,发现了一个与 urllib 库类似的库( urllib.request.urlopen 出现 500 错误),但是我无法适应它,我更想在这里使用请求。

I might be hitting here some missing proxy setting, as suggested for example here ( Perl File::Fetch Failed HTTP response: 500 Internal Server Error ), but can someone explain me, what is the proper workaround with this one?我可能会在这里点击一些缺少的代理设置,例如此处的建议( Perl File::Fetch Failed HTTP response: 500 Internal Server Error ),但是有人可以解释一下,这个的正确解决方法是什么?

One thing that is different with the browser request is the User-Agent;与浏览器请求不同的一件事是 User-Agent; however you can alter it using requests like this:但是你可以使用这样的请求来改变它:

url = 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.90 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.status_code) #should be 200

Edit编辑

Some web applications will also check the Origin and/or the Referer headers (for example for AJAX requests);一些 Web 应用程序还会检查Origin和/或Referer标头(例如 AJAX 请求); you can set these in a similar fashion to User-Agent .您可以以与User-Agent类似的方式设置这些。

headers = {
    'Origin': 'http://example.com',
    'Referer': 'http://example.com/some_page'
}

Remember, you are setting these headers to basically bypass checks so please be a good netizen and don't abuse people's resources.请记住,您设置这些标题基本上是为了绕过检查,所以请做一个好网民,不要滥用人们的资源。

The User-Agent, and also other header elements, could be causing your problem.用户代理以及其他标题元素可能会导致您的问题。

When I came accross this error I watched a regular request made by a browser using Wireshark, and it turned out there were things other than just the User-Agent in the header which the server expected to be there.当我遇到这个错误时,我看到了浏览器使用 Wireshark 发出的常规请求,结果发现除了服务器预期存在的头部中的 User-Agent 之外,还有其他东西。

After emulating the header sent by the browser in python requests, the server stopped throwing errors.在python请求中模拟浏览器发送的标头后,服务器停止抛出错误。

But Wait!但是等等! There's More!还有更多!

The above answers did help me on the path to resolution, but I had to find still more things to add to my headers so that certain sites would let me in using python requests.上面的答案确实帮助我解决了问题,但我必须找到更多的东西添加到我的标题中,以便某些网站让我使用 python 请求。 Learning how to use Wireshark (suggested above) was a good new skill for me, but I found an easier way.学习如何使用 Wireshark(以上建议)对我来说是一项很好的新技能,但我找到了一种更简单的方法。

If you go to your developer view (right-click then click Inspect in Chrome), then go to the Network tab, and then select one of the Names at left and then look under Headers for Requests Headers and expand, you'll get a complete list of what your system is sending to the server.如果你去你的开发视图(右键单击然后点击Chrome浏览器检查),然后去网络选项卡,然后选择左边的名称之一,然后下寻找请求的标头和扩展,你会得到一个您的系统发送到服务器的内容的完整列表。 I started adding elements that I thought were most likely needed one at a time and testing until my errors went away.我开始一次添加我认为最有可能需要的元素并进行测试,直到我的错误消失。 Then I reduced that set to the smallest possible set that worked.然后我将该集合减少到可行的最小集合。 In my case, with my headers having only User-Agent to deal with other code issues, I only needed to add the Accept-Language key to deal with a few other sites.在我的例子中,我的头只有User-Agent来处理其他代码问题,我只需要添加Accept-Language键来处理其他一些网站。 See picture below as a guide to the text above.请参阅下图作为上述文本的指南。

I hope this process helps others to find ways to eliminate undesirable python requests return codes where possible.我希望这个过程可以帮助其他人找到尽可能消除不需要的 Python 请求返回码的方法。

Chrome 中我的开发人员/检查窗口的屏幕截图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM