Python33-使用BaseHTTPRequestHandler改善服务器安全性

Question

I have lately been improving security on my webserver, which I wrote myself using http.server and BaseHTTPRequestHandler . 我最近一直在提高Web服务器的安全性，我使用http.server和BaseHTTPRequestHandler编写了自己的Web服务器。 I have blocked ( 403 'd) most essential server files, which I do not want users to be able to access. 我已阻止（ 403 ）最重要的服务器文件，我不希望用户能够访问这些文件。 Files include the python server script and all databases, plus some HTML templates. 文件包括python服务器脚本和所有数据库，以及一些HTML模板。

However, in this post on stackoverflow I read that using open(curdir + sep + self.path) in a do_GET request might potentially make every file on your computer readable. 但是，在这篇关于stackoverflow的文章中，我读到在do_GET请求中使用open(curdir + sep + self.path)可能会使计算机上的每个文件都可读。 Can someone explain this to me? 谁可以给我解释一下这个？ If the self.path is ip:port/index.html every time, how can someone access files that are above the root / directory? 如果self.path是ip:port/index.html每一次，怎么能说是根上面有人访问文件/目录？

I understand that the user (obviously) can change the index.html to anything else, but I don't see how they can access directories above root . 我知道用户（显然）可以将index.html更改为其他任何内容，但是我看不出他们如何访问root之上的root 。

Also if you're wondering why I'm not using nginx or apache , I wanted to create my own web server and website for learning purposes. 另外，如果您想知道为什么我不使用nginx或apache ，我想创建自己的Web服务器和网站来进行学习。 I have no intention to run an actual website myself, and if I do want to, I will probably rent a server or use existing server software. 我无意自己运行一个实际的网站，如果愿意，我可能会租用服务器或使用现有的服务器软件。

class Handler(http.server.BaseHTTPRequestHandler):

    def do_GET(self):

        try:
            if "SOME BLOCKED FILE OR DIRECTORY" in self.path:
                self.send_error(403, "FORBIDDEN")
                return
            #I have about 6 more of these 403 parts, but I left them out for readability
            if self.path.endswith(".html"):
                if self.path.endswith("index.html"):
                    #template is the Template Engine that I created to create dynamic HTML content
                    parser = template.TemplateEngine()
                    content = parser.get_content("index", False, "None", False)
                    self.send_response(200)
                    self.send_header("Content-type", "text/html")
                    self.end_headers()
                    self.wfile.write(content.encode("utf-8"))
                    return
                elif self.path.endswith("auth.html"):
                    parser = template.TemplateEngine()
                    content = parser.get_content("auth", False, "None", False)
                    self.send_response(200)
                    self.send_header("Content-type", "text/html")
                    self.end_headers()
                    self.wfile.write(content.encode("utf-8"))
                    return
                elif self.path.endswith("about.html"):
                    parser = template.TemplateEngine()
                    content = parser.get_content("about", False, "None", False)
                    self.send_response(200)
                    self.send_header("Content-type", "text/html")
                    self.end_headers()
                    self.wfile.write(content.encode("utf-8"))
                    return
                else:
                    try:
                        f = open(curdir + sep + self.path, "rb")
                        self.send_response(200)
                        self.send_header("Content-type", "text/html")
                        self.end_headers()
                        self.wfile.write((f.read()))
                        f.close()
                        return
                    except IOError as e:
                        self.send_response(404)
                        self.send_header("Content-type", "text/html")
                        self.end_headers()
                        return
            else:
                if self.path.endswith(".css"):
                    h1 = "Content-type" 
                    h2 = "text/css"
                elif self.path.endswith(".gif"):
                    h1 = "Content-type"
                    h2 = "gif"
                elif self.path.endswith(".jpg"):
                    h1 = "Content-type"
                    h2 = "jpg"
                elif self.path.endswith(".png"):
                    h1 = "Content-type"
                    h2 = "png"
                elif self.path.endswith(".ico"):
                    h1 = "Content-type"
                    h2 = "ico"
                elif self.path.endswith(".py"):
                    h1 = "Content-type"
                    h2 = "text/py"
                elif self.path.endswith(".js"):
                    h1 = "Content-type"
                    h2 = "application/javascript"
                else:
                    h1 = "Content-type"
                    h2 = "text"
                f = open(curdir+ sep + self.path, "rb")
                self.send_response(200)
                self.send_header(h1, h2)
                self.end_headers()
                self.wfile.write(f.read())
                f.close()
                return
        except IOError:
            if "html_form_action.asp" in self.path:
                pass
            else:
                self.send_error(404, "File not found: %s" % self.path)
        except Exception as e:
            self.send_error(500)
            print("Unknown exception in do_GET: %s" % e)

Answer 1

You're making an invalid assumption: 您所做的假设无效：

If the self.path is ip:port/index.html every time, how can someone access files that are above the root / directory? 如果self.path都是ip:port/index.html ，那么有人如何访问根目录下的文件？

But self.path is never ip:port/index.html . 但是self.path是从来没有 ip:port/index.html 。 Try logging it and see what you get. 尝试记录下来，看看会得到什么。

For example, if I request http://example.com:8080/foo/bar/index.html , the self.path is not example.com:8080/foo/bar/index.html , but just /foo/bar/index.html . 例如，如果我请求http://example.com:8080/foo/bar/index.html ，则self.path不是example.com:8080/foo/bar/index.html ，而只是/foo/bar/index.html 。 In fact, your code couldn't possibly work otherwise, because curdir+ sep + self.path would give you a path starting with ./example.com:8080/ , which won't exist. 实际上，您的代码可能无法正常运行，因为curdir+ sep + self.path会为您提供以./example.com:8080/开头的路径，该路径将不存在。

And then ask yourself what happens if it's /../../../../../../../etc/passwd . 然后问自己，如果是/../../../../../../../etc/passwd会发生什么。

This is one of many reasons to use os.path instead of string manipulation for paths. 这是使用os.path而不是对路径进行字符串处理的众多原因之一。 For examples, instead of this: 例如，代替此：

f = open(curdir + sep + self.path, "rb")

Do this: 做这个：

path = os.path.abspath(os.path.join(curdir, self.path))
if os.path.commonprefix((path, curdir)) != curdir:
    # illegal!

I'm assuming that curdir here is an absolute path, not just from os import curdir or some other thing that's more likely to give you . 我假设这里的curdir是一条绝对路径，而不仅仅是from os import curdir或其他更有可能为您提供的东西. than anything else. 比什么都重要。 If it's the latter, make sure to abspath it as well. 如果是后者，请确保也abspath 。

This can catch other ways of escaping the jail as well as passing in .. strings… but it's not going to catch everything. 这可以捕获逃脱监狱以及传递..字符串的其他方法……但是它并不能捕获所有内容。 For example, if there's a symlink pointing out of the jail, there's no way abspath can tell that someone's gone through the symlink. 例如，如果有一个指向监狱的符号链接，则绝对abspath告诉某人已通过符号链接。

Answer 2

self.path contains the request path. self.path包含请求路径。 If I were to send a GET request and ask for the resource located at /../../../../../../../etc/passwd , I would break out of your application's current folder and be able to access any file on your filesystem (that you have permission to read). 如果我要发送GET请求并请求位于/../../../../../../../etc/passwd的资源，我将退出应用程序的当前文件夹并能够访问文件系统上的任何文件（您具有读取权限）。

Python33-使用BaseHTTPRequestHandler改善服务器安全性

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-08-02 01:42:37

解决方案2
2 2013-08-02 01:28:40

Python33-使用BaseHTTPRequestHandler改善服务器安全性

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-08-02 01:42:37

解决方案2 2 2013-08-02 01:28:40

解决方案1
3 已采纳 2013-08-02 01:42:37

解决方案2
2 2013-08-02 01:28:40