简体   繁体   English

使用Python http.client访问TWiki页面

[英]Accessing TWiki page with Python http.client

I'm trying to access my local TWiki installation with python http.client. 我正在尝试使用python http.client访问本地TWiki安装。 For some reason I always end up with 403 Forbidden. 由于某种原因,我总是以403 Forbidden告终。 I can access other sub folders in my server, but not twiki. 我可以访问服务器中的其他子文件夹,但不能访问twiki。 I can access this TWiki page with curl. 我可以通过curl访问此TWiki页面。 Is there something special you need to do when accessing /bin/ or /cgi-bin/ folders with python http.client? 使用python http.client访问/ bin /或/ cgi-bin /文件夹时,您需要做些特别的事情吗?

Here is example with twiki.org pages, because my localhost is not accessible outside: 这是twiki.org页面的示例,因为我的本地主机无法在外部访问:

>>> import httplib
>>> conn = httplib.HTTPConnection("twiki.org")
>>> conn.request("GET", "/cgi-bin/view/")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
403 Forbidden
>>> data1 = r1.read()
>>> data1
'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>403 Forbidden</title>\n</head><body>\n<h1>Forbidden</h1>\n<p>You don\'t have permission to access /cgi-bin/view/\non this server.</p>\n<hr>\n<address>Apache/2.2.3 (CentOS) Server at twiki.org Port 80</address>\n</body></html>\n'
>>> 

I just tried this myself and I found that setting a User-Agent header seemed to fix it. 我自己尝试了一下,发现设置User-Agent标头似乎可以解决此问题。 It didn't seem to matter what the header was, simply that it was set: 标头是什么似乎并不重要,只需设置它即可:

>>> import httplib
>>> conn = httplib.HTTPConnection("twiki.org")
>>> conn.request("GET", "/cgi-bin/view/", headers={"User-Agent": "foo"})
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
200 OK

Unfortunately I can't shed any light on why Twiki returns a 403 without a User-Agent header - I just tried it on the basis that it's one of the likely differences between clients. 不幸的是,我无法说明为什么Twiki在没有User-Agent标头的情况下返回403的原因-我只是根据它是客户端之间可能存在的差异之一尝试了它。 I assume it's something like the fact that it's trying to decide whether to return the mobile version of the site, but it's really poor not to handle the case of no header gracefully. 我认为这类似于事实,即它试图确定是否返回该网站的移动版本,但是如果不能优雅地处理没有标题的情况,那确实很糟糕。

Hopefully that at least provides a work-around for you, however. 希望至少可以为您提供一种解决方法。

EDIT 编辑

Apparently this is part of their default Apache config using the BrowserMatchNoCase directive to set an environment variable blockAccess which is presumably picked up later to return the observed 403 Forbidden response. 显然,这是其默认Apache配置的一部分,该配置使用BrowserMatchNoCase指令来设置环境变量blockAccess ,该环境变量可能稍后被获取,以返回观察到的403 Forbidden响应。

They seem to think that this prevents DoS attacks somehow, although I'm really unconvinced by anything that can be worked around by simply setting a random User-Agent string. 他们似乎认为这可以以某种方式阻止DoS攻击,尽管我真的不相信通过设置随机User-Agent字符串可以解决的任何事情。 As you can tell from that config, they also have a list of "known bad" user agents they attempt to block. 从该配置可以看出,它们还有一个他们试图阻止的“已知错误”用户代理列表。 You can observe this by attempting to use one of them to fetch from the command-line: 您可以通过尝试使用其中之一从命令行获取来观察此情况:

$ GET -Ssed -H "User-Agent: some-random-name" http://twiki.org/cgi-bin/view/
GET http://twiki.org/cgi-bin/view/
200 OK
[...]
$ GET -Ssed -H "User-Agent: FAST" http://twiki.org/cgi-bin/view/
GET http://twiki.org/cgi-bin/view/
403 Forbidden
[...]

I'm sure they have their reasons for doing this, but I must say I'm not impressed. 我确定他们有这样做的理由,但是我必须说,我没有留下深刻的印象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM