使用 Python 安全提取 zip 或 tar

Question

I'm trying to extract user-submitted zip and tar files to a directory.我正在尝试将用户提交的 zip 和 tar 文件提取到目录中。 The documentation for zipfile's extractall method (similarly with tarfile's extractall ) states that it's possible for paths to be absolute or contain .. paths that go outside the destination path. zipfile 的extractall方法（与 tarfile 的extractall类似）的文档指出路径可能是绝对路径或包含目标路径之外的..路径。 Instead, I could use extract myself, like this:相反，我可以使用extract myself，如下所示：

some_path = '/destination/path'
some_zip = '/some/file.zip'
zipf = zipfile.ZipFile(some_zip, mode='r')
for subfile in zipf.namelist():
    zipf.extract(subfile, some_path)

Is this safe?这样安全吗？ Is it possible for a file in the archive to wind up outside of some_path in this case?在这种情况下，存档中的文件是否有可能在some_path之外结束？ If so, what way can I ensure that files will never wind up outside the destination directory?如果是这样，我可以通过什么方式确保文件永远不会在目标目录之外结束？

Answer 1

Note: Starting with python 2.7.4, this is a non-issue for ZIP archives.注意：从 python 2.7.4 开始，这不是 ZIP 档案的问题。 Details at the bottom of the answer.答案底部的详细信息。 This answer focuses on tar archives.这个答案侧重于 tar 档案。

To figure out where a path really points to, use os.path.abspath() (but note the caveat about symlinks as path components).要找出路径真正指向的位置，请使用os.path.abspath() （但请注意有关符号链接作为路径组件的警告）。 If you normalize a path from your zipfile with abspath and it does not contain the current directory as a prefix, it's pointing outside it.如果您使用abspath规范化 zipfile 中的路径并且它不包含当前目录作为前缀，则它指向它之外。

But you also need to check the value of any symlink extracted from your archive (both tarfiles and unix zipfiles can store symlinks).但是您还需要检查从存档中提取的任何符号链接的值（tarfile 和 unix zip 文件都可以存储符号链接）。 This is important if you are worried about a proverbial "malicious user" that would intentionally bypass your security, rather than an application that simply installs itself in system libraries.如果您担心众所周知的“恶意用户”会故意绕过您的安全，而不是简单地将自身安装在系统库中的应用程序，那么这一点很重要。

That's the aforementioned caveat: abspath will be misled if your sandbox already contains a symlink that points to a directory.这是前面提到的警告：如果您的沙箱已经包含指向目录的符号链接， abspath将被误导。 Even a symlink that points within the sandbox can be dangerous: The symlink sandbox/subdir/foo ->.. points to sandbox , so the path sandbox/subdir/foo/../.bashrc should be disallowed.即使是指向沙箱内的符号链接也可能是危险的：符号链接sandbox/subdir/foo ->..指向sandbox ，因此路径sandbox/subdir/foo/../.bashrc应该被禁止。 The easiest way to do so is to wait until the previous files have been extracted and use os.path.realpath() .最简单的方法是等到先前的文件被提取出来并使用os.path.realpath() 。 Fortunately extractall() accepts a generator, so this is easy to do.幸运的是extractall()接受一个生成器，所以这很容易做到。

Since you ask for code, here's a bit that explicates the algorithm.由于您要求提供代码，这里有一点可以解释算法。 It prohibits not only the extraction of files to locations outside the sandbox (which is what was requested), but also the creation of links inside the sandbox that point to locations outside the sandbox.它不仅禁止将文件提取到沙箱外的位置（这是所要求的），而且还禁止在沙箱内创建指向沙箱外位置的链接。 I'm curious to hear if anyone can sneak any stray files or links past it.我很想知道是否有人可以偷偷通过它的任何杂散文件或链接。

import tarfile
from os.path import abspath, realpath, dirname, join as joinpath
from sys import stderr

resolved = lambda x: realpath(abspath(x))

def badpath(path, base):
    # joinpath will ignore base if path is absolute
    return not resolved(joinpath(base,path)).startswith(base)

def badlink(info, base):
    # Links are interpreted relative to the directory containing the link
    tip = resolved(joinpath(base, dirname(info.name)))
    return badpath(info.linkname, base=tip)

def safemembers(members):
    base = resolved(".")

    for finfo in members:
        if badpath(finfo.name, base):
            print >>stderr, finfo.name, "is blocked (illegal path)"
        elif finfo.issym() and badlink(finfo,base):
            print >>stderr, finfo.name, "is blocked: Hard link to", finfo.linkname
        elif finfo.islnk() and badlink(finfo,base):
            print >>stderr, finfo.name, "is blocked: Symlink to", finfo.linkname
        else:
            yield finfo

ar = tarfile.open("testtar.tar")
ar.extractall(path="./sandbox", members=safemembers(ar))
ar.close()

Edit: Starting with python 2.7.4, this is a non-issue for ZIP archives: The method zipfile.extract() prohibits the creation of files outside the sandbox:编辑：从 python 2.7.4 开始，这对于 ZIP 档案来说不是问题：方法zipfile.extract()禁止在沙箱外创建文件：

Note: If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, eg: ///foo/bar becomes foo/bar on Unix, and C:\foo\bar becomes foo\bar on Windows. And all ".." components in a member filename will be removed, eg: ../../foo../../ba..r becomes foo../ba..r .注意：如果成员文件名是绝对路径，驱动器/UNC sharepoint 和前导（反）斜杠将被去除，例如： ///foo/bar foo/bar ，而C:\foo\bar变为foo\bar on Windows。成员文件名中的所有".."组件将被删除，例如：../../foo../../ ../../foo../../ba..r变为foo../ba..r 。 On Windows, illegal characters ( : , < , > , | , " , ? , and * ) [are] replaced by underscore (_).在 Windows 上，非法字符（ : 、 < 、 > 、 | 、 " ?和* ）[被] 替换为下划线 (_)。

The tarfile class has not been similarly sanitized, so the above answer still apllies. tarfile class 没有经过类似的清理，所以上面的答案仍然适用。

Answer 2

Contrary to the popular answer, unzipping files safely is not completely solved as of Python 2.7.4.与流行的答案相反，从 Python 2.7.4 开始，安全解压缩文件并未完全解决。 The extractall method is still dangerous and can lead to path traversal, either directly or through the unzipping of symbolic links. extractall 方法仍然很危险，可能直接或通过符号链接的解压缩导致路径遍历。 Here was my final solution which should prevent both attacks in all versions of Python, even versions prior to Python 2.7.4 where the extract method was vulnerable:这是我的最终解决方案，它应该可以防止 Python 的所有版本中的这两种攻击，甚至是提取方法易受攻击的 Python 2.7.4 之前的版本：

import zipfile, os

def safe_unzip(zip_file, extract_path='.'):
    with zipfile.ZipFile(zip_file, 'r') as zf:
        for member in zf.infolist():
            file_path = os.path.realpath(os.path.join(extract_path, member.filename))
            if file_path.startswith(os.path.realpath(extract_path)):
                zf.extract(member, extract_path)

Edit 1: Fixed variable name clash.编辑 1：修复变量名冲突。 Thanks Juuso Ohtonen.谢谢 Juuso Ohtonen。

Edit 2: s/abspath/realpath/g .编辑 2： s/abspath/realpath/g 。 Thanks TheLizzard谢谢蜥蜴

Answer 3

Use ZipFile.infolist() / TarFile.next() / TarFile.getmembers() to get the information about each entry in the archive, normalize the path, open the file yourself, use ZipFile.open() / TarFile.extractfile() to get a file-like for the entry, and copy the entry data yourself.使用ZipFile.infolist() / TarFile.next() / TarFile.getmembers()获取存档中每个条目的信息，归一化路径，自己打开文件，使用ZipFile.open() / TarFile.extractfile()为条目获取类似文件的文件，然后自己复制条目数据。

Answer 4

Copy the zipfile to an empty directory.将压缩文件复制到一个空目录。 Then use os.chroot to make that directory the root directory.然后使用os.chroot使该目录成为根目录。 Then unzip there.然后在那里解压。

Alternatively, you can call unzip itself with the -j flag, which ignores the directories:或者，您可以使用-j标志调用unzip本身，它会忽略目录：

import subprocess
filename = '/some/file.zip'
rv = subprocess.call(['unzip', '-j', filename])

使用 Python 安全提取 zip 或 tar

问题描述

4 个解决方案

解决方案1
45 2012-04-09 17:44:16

解决方案2
4 2016-04-12 20:53:56

解决方案3
3 2012-04-08 03:19:54

解决方案4
3 2012-04-15 11:57:50

使用 Python 安全提取 zip 或 tar

问题描述

4 个解决方案

解决方案1 45 2012-04-09 17:44:16

解决方案2 4 2016-04-12 20:53:56

解决方案3 3 2012-04-08 03:19:54

解决方案4 3 2012-04-15 11:57:50

解决方案1
45 2012-04-09 17:44:16

解决方案2
4 2016-04-12 20:53:56

解决方案3
3 2012-04-08 03:19:54

解决方案4
3 2012-04-15 11:57:50