简体   繁体   English

tarfile.extractall()的Unicode问题(Python 2.7)

[英]Unicode issues with tarfile.extractall() (Python 2.7)

I'm using python 2.7.6 on Windows and I'm using the tarfile module to extract a file a gzip file. 我在Windows上使用python 2.7.6,并且正在使用tarfile模块将文件提取为gzip文件。 The mode option of tarfile.open() is set to "r:gz" . tarfile.open()mode选项设置为"r:gz" After the open call, if I were to print the contents of the archive via tarfile.list() , I see the following directory in the list: 调用结束后,如果要通过tarfile.list()打印档案的内容,则在列表中将看到以下目录:

./静态分析 Part 1.v1/

However, after I call tarfile.extractall(), I don't see the above directory in the extracted list of files, instead I see this: 但是,在调用tarfile.extractall()之后,在提取的文件列表中没有看到上面的目录,而是看到了:

é™æ€åˆ†æž Part 1.v1/

If I were to extract the archive via 7zip, I see a directory with the same name as the first item above. 如果要通过7zip提取档案,我会看到一个目录,名称与上面的第一项相同。 So, clearly, the extractall() method is screwing up, but I don't know how to fix this. 因此,很明显,extractall()方法正在搞砸,但我不知道该如何解决。

I learned that tar doesn't retain the encoding information as part of the archive and treats filenames as raw byte sequences. 我了解到tar不会将编码信息保留为存档的一部分,而是将文件名视为原始字节序列。 So, the output I saw from tarfile.extractall() was simply raw the character sequence that comprised the file's name prior to compression. 因此,我从tarfile.extractall()看到的输出只是原始字符序列,该字符序列包含压缩前文件的名称。 In order to get the extractall() method to recreate the original filenames, I discovered that you have to manually convert the members of the TarFile object to the appropriate encoding before calling extractall() . 为了获得extractall()方法来重新创建原始文件名,我发现您必须在调用TarFile extractall()之前手动将TarFile对象的members转换为适当的编码。 In my case, the following did the trick: 就我而言,以下方法可以解决问题:

  modeltar = tarfile.open(zippath, mode="r:gz")
  updatedMembers = []
  for m in modeltar.getmembers():
    m.name = unicode(m.name, 'utf-8')
    updatedMembers.append(m)
  modeltar.extractall(members=updatedMembers, path=dbpath)

The above code is based on this superuser answer: https://superuser.com/a/190786/354642 上面的代码基于以下超级用户答案: https : //superuser.com/a/190786/354642

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM