urllib和请求始终返回状态码200

Question

I'm working on a new project and I'd like to implement a "wait until the website is open" fonction, where it would look up if http://switch-check.cf/index.php is open and then continue. 我正在研究一个新项目，我想实现“等到网站打开”功能，如果http://switch-check.cf/index.php打开，它将在其中查找，然后继续。

For now with the help of .htaccess and php I tried my best for that all .php files are forbidden access. 目前，在.htaccess和php的帮助下，我竭尽全力禁止所有.php文件的访问。 So if you try to access the webpage I mentioned you should get a 因此，如果您尝试访问我提到的网页，则应该获得一个

403 Access denied 403访问被拒绝

So I'm using urllib (tried with requests too) to see if the website is open or still in forbidden access print(urllib.request.urlopen("http://switch-check.cf/index.php").getcode()) However whatever I try I always get a 200 HTTP status code, not a 403. Even if I try unexistant subdomains and files, the status code is always 200. Is there anyway to fix this? 因此，我使用urllib（也尝试了请求）来查看网站是否处于打开状态或仍处于禁止访问状态print(urllib.request.urlopen("http://switch-check.cf/index.php").getcode())但是，无论我尝试什么，我总是得到200 HTTP状态代码，而不是403。即使我尝试使用不存在的子域和文件，状态代码也始终为200。总有办法解决吗？ Or to achieve the same result I'm looking for with a different approach?> 还是要以不同的方式获得相同的结果？>

Thank you :) 谢谢：）

Answer 1

The way to debug this is to try it in a browser (where you get a 403) and in your code (where you get a 200), compare the request headers, and bisect on the differences. 调试此方法的方法是在浏览器（获得403）和代码（获得200）中进行尝试，比较请求标头，然后将差异二等分。

-- -

I did this using the "Network" panel in Chrome's devtools, and using requests so I can just print(page.request.headers) . 我是使用Chrome开发者工具中的“网络”面板并使用requests来完成此操作的，因此我只需要print(page.request.headers) 。

From Chrome: 在Chrome中：

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Cookie: __test=9eea7a0d55374cb5b0673e2058581017
Host: switch-check.cf
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36

From requests: 从请求：

User-Agent python-requests/2.18.4
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

Before even getting to those headers: Chrome requested index.php?i=1 instead of just index.php . 在获得这些标头之前：Chrome浏览器要求index.php?i=1而不仅仅是index.php 。 So apparently there was a redirect while I wasn't paying attention. 因此很明显，在我不注意的情况下进行了重定向。 And that redirect isn't happening in requests , which implies that it's likely scripted. 并且重定向未在requests发生，这意味着它可能已编写脚本。

And meanwhile, I know I said to bisect, but the fact that there's a cookie there is immediately suspicious. 同时，我知道我说过一分为二，但是有一个cookie的事实立即引起了怀疑。

So, let's look at the actual 200 response, run through a pretty-printer: 因此，让我们看一下通过漂亮打印机运行的实际200响应：

<html>

<body>
    <script type="text/javascript" src="/aes.js"></script>
    <script>
        function toNumbers(d) {
            var e = [];
            d.replace(/(..)/g, function(d) {
                e.push(parseInt(d, 16))
            });
            return e
        }

        function toHex() {
            for (var d = [], d = 1 == arguments.length && arguments[0].constructor == Array ? arguments[0] : arguments, e = "", f = 0; f < d.length; f++) e += (16 > d[f] ? "0" : "") + d[f].toString(16);
            return e.toLowerCase()
        }
        var a = toNumbers("f655ba9d09a112d4968c63579db590b4"),
            b = toNumbers("98344c2eee86c3994890592585b49f80"),
            c = toNumbers("c4ba932dbf1d8d33ca88410be4f79eb0");
        document.cookie = "__test=" + toHex(slowAES.decrypt(c, 2, a, b)) + "; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/";
        location.href = "http://switch-check.cf/index.php?i=1";
    </script>
    <noscript>This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support</noscript>
</body>

</html>

Well, there's your problem. 好吧，那是你的问题。 You're not actually rejecting access to index.php at all; 您实际上并没有完全拒绝访问index.php ； you're returning a 200 with some JavaScript that adds a randomized cookie and then redirects to index.php?i=1 . 您将返回200，其中包含一些添加了随机Cookie的JavaScript，然后将其重定向到index.php?i=1 。 And that's where you reject them. 那就是您拒绝他们的地方。

Is it the cookie, or the redirect, that triggers the 403? 是cookie还是重定向触发了403？ Let's try both with Requests: 让我们尝试两种请求：

>>> r = requests.get('http://switch-check.cf/index.php', headers={'Cookie': '__test=9eea7a0d55374cb5b0673e2058581017'})
>>> r.status_code
403

>>> r = requests.get('http://switch-check.cf/index.php?i=1')
>>> r.status_code
200

So, you're only forbidding access based on a cookie that's generated by JavaScript. 因此，您仅禁止基于JavaScript生成的Cookie进行访问。

What if we just send a nonsense cookie? 如果我们只是发送废话cookie该怎么办？

>>> r = requests.get('http://switch-check.cf/index.php', headers={'Cookie': '__test=' + '0'*32})
>>> r.status_code
403
>>> r = requests.get('http://switch-check.cf/index.php', headers={'Cookie': '__test=' + str(uuid.uuid4().hex})
>>> r.status_code
403

Wow. 哇。 It actually has to be the right cookie, the one the server was expecting, or you don't get rejected? 它实际上必须是正确的 cookie，即服务器所期望的cookie，否则您不会被拒绝吗？ That's the opposite of the logic you'd normally want. 这与您通常需要的逻辑相反。

You could write some urllib or requests code to cooperate the way a browser does—either run a JS interpreter, or parse out the three numbers and AES them and build a cookie yourself. 您可以编写一些urllib或requests代码来与浏览器进行协作—运行JS解释器，或者解析三个数字并对其进行AES加密，然后自己构建一个cookie。 But that seems like a silly thing to do. 但这似乎是一件愚蠢的事情。

The right thing to do is to change the server to actually forbid access to index.php , instead of returning JS code that generates a special cookie that will allow you to get forbidden if you want. 正确的做法是将服务器更改为实际上禁止访问index.php ，而不是返回生成特殊cookie的JS代码，如果需要，该cookie将使您被禁止。

How do you do that? 你是怎样做的？

Well, you say: 好吧，你说：

with the help of .htaccess and php I tried my best for that all .php files are forbidden access 在.htaccess和php的帮助下，我竭尽全力禁止所有.php文件的访问

First, as far as I can tell, you think you're using Apache, and are following some guide somewhere to how to forbid access in Apache, but you're actually using nginx. 首先，据我所知，您认为您正在使用Apache，并且正在遵循某些指南来禁止在Apache中进行访问，但是实际上您是在使用nginx。 (Look at the Server header in the responses.) （查看响应中的“ Server标头。）

And meanwhile, I don't know what you're doing in PHP, but you probably got some code that's intended to require a valid cookie from a valid JS-running browser that's (a) wrong and gets it backward, (b) overly complicated, and (c) not what you wanted in the first place. 同时，我不知道您在PHP中正在做什么，但是您可能得到了一些代码，这些代码旨在从有效的JS运行的浏览器中要求有效的cookie，这是（a）错误并将其向后，（b）过多复杂，并且（c）首先不是您想要的。

I don't know whether you have a PHP question here, or an nginx question on Server Fault, or something else. 我不知道您在这里是否有PHP问题，还是有关Server Fault的Nginx问题，还是其他问题。 But that's the side you need to fix. 但这就是您需要修复的方面。

urllib和请求始终返回状态码200

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-07-18 18:11:51

urllib和请求始终返回状态码200

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-07-18 18:11:51

解决方案1
2 已采纳 2018-07-18 18:11:51