简体   繁体   English

弧名的python zipfile编码

[英]python zipfile encoding for arcname

I'm trying to add several files to a zip with Python's zipfile library. 我正在尝试使用Python的zipfile库将多个文件添加到zip中。 The problem is in the filename that is zipped, which contains special characters (utf-8). 问题出在压缩的文件名中,其中包含特殊字符(utf-8)。

Here is a basic code : 这是一个基本代码:

#!/usr/bin/env python

import zipfile

infilename = "test_file"
outfilename = "test.zip"
filename = u'Conf\xe9d\xe9ration.txt'

if __name__ == '__main__':
    f = open(outfilename, "w")
    archive = zipfile.ZipFile(f, "w", zipfile.ZIP_DEFLATED)
    archive.write(infilename, filename.encode("CP437"))
    archive.close()
    f.close()

The file generated is not correctly read with every zip extractor : 并非每个zip提取器都正确读取生成的文件:

  • Ubuntu 10.04 & 11.10 : Conf?d?ration.txt Ubuntu 10.04和11.10:Conf?d?ration.txt
    File could not be extracted : "caution: filename not matched: Conf\\?d\\?ration.txt" 无法提取文件:“警告:文件名不匹配:Conf \\?d \\?ration.txt”

  • Windows XP & 7 : Confédération.txt Windows XP和7:Confédération.txt
    File could be read 可以读取文件

  • MacOSX (Lion) : ConfÇdÇration.txt MacOSX(Lion):ConfÇdÇration.txt
    File could be read 可以读取文件

I tried without encoding to CP437 changing just one line to : 我尝试不对CP437进行编码而仅将以下一行更改为:

    archive.write(infilename, filename)

This time Ubuntu has still the same problem, Windows gives "Conf+®d+®ration.txt" and MacOSX works perfectly. 这次Ubuntu仍然有同样的问题,Windows给出了“ Conf +®d+®ration.txt”,而MacOSX则完美运行。

Someone knows a (pythonic) cross-plateform solution? 有人知道(pythonic)跨平台解决方案吗?

Thanks! 谢谢!

Looks like file name is written "as it is" (ie first time it is written in CP437 encoding, and second - in UTF8), while other archive handlers use different approach: 看起来文件名是按“原样”编写的(即,第一次使用CP437编码编写,第二次-使用UTF8编写),而其他归档处理程序使用不同的方法:

  • Windows : it uses DOS/OEM encoding for file names inside of archive, that's why CP437 works. Windows:它将DOS / OEM编码用于存档内的文件名,这就是CP437起作用的原因。 And, this behavior is described in PKWare standard; 并且,此行为在PKWare标准中进行了描述;
  • Mac OS : it silently uses utf-8, which violates standard. Mac OS:它默默使用utf-8,这违反了标准。 And that's why utf8 works in Mac OS. 这就是utf8在Mac OS上运行的原因。
  • Linux/Unix: they use system code page for file names inside of archive, don't know to which one your Linux installation is configured, but not for DOS, and not for UTF8 encoding :) Linux / Unix:他们使用系统代码页作为归档文件中的文件名,不知道您的Linux安装是配置到哪个目录,而不是DOS,也不是UTF8编码:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM