简体   繁体   中英

python zipfile encoding for arcname

I'm trying to add several files to a zip with Python's zipfile library. The problem is in the filename that is zipped, which contains special characters (utf-8).

Here is a basic code :

#!/usr/bin/env python

import zipfile

infilename = "test_file"
outfilename = "test.zip"
filename = u'Conf\xe9d\xe9ration.txt'

if __name__ == '__main__':
    f = open(outfilename, "w")
    archive = zipfile.ZipFile(f, "w", zipfile.ZIP_DEFLATED)
    archive.write(infilename, filename.encode("CP437"))
    archive.close()
    f.close()

The file generated is not correctly read with every zip extractor :

  • Ubuntu 10.04 & 11.10 : Conf?d?ration.txt
    File could not be extracted : "caution: filename not matched: Conf\\?d\\?ration.txt"

  • Windows XP & 7 : Confédération.txt
    File could be read

  • MacOSX (Lion) : ConfÇdÇration.txt
    File could be read

I tried without encoding to CP437 changing just one line to :

    archive.write(infilename, filename)

This time Ubuntu has still the same problem, Windows gives "Conf+®d+®ration.txt" and MacOSX works perfectly.

Someone knows a (pythonic) cross-plateform solution?

Thanks!

Looks like file name is written "as it is" (ie first time it is written in CP437 encoding, and second - in UTF8), while other archive handlers use different approach:

  • Windows : it uses DOS/OEM encoding for file names inside of archive, that's why CP437 works. And, this behavior is described in PKWare standard;
  • Mac OS : it silently uses utf-8, which violates standard. And that's why utf8 works in Mac OS.
  • Linux/Unix: they use system code page for file names inside of archive, don't know to which one your Linux installation is configured, but not for DOS, and not for UTF8 encoding :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM