[英]Unicode issues with tarfile.extractall() (Python 2.7)
I'm using python 2.7.6 on Windows and I'm using the tarfile module to extract a file a gzip file. 我在Windows上使用python 2.7.6,并且正在使用tarfile模块将文件提取为gzip文件。 The
mode
option of tarfile.open()
is set to "r:gz"
. tarfile.open()
的mode
选项设置为"r:gz"
。 After the open call, if I were to print the contents of the archive via tarfile.list()
, I see the following directory in the list: 调用结束后,如果要通过
tarfile.list()
打印档案的内容,则在列表中将看到以下目录:
./静态分析 Part 1.v1/
However, after I call tarfile.extractall(), I don't see the above directory in the extracted list of files, instead I see this: 但是,在调用tarfile.extractall()之后,在提取的文件列表中没有看到上面的目录,而是看到了:
é™æ€åˆ†æž Part 1.v1/
If I were to extract the archive via 7zip, I see a directory with the same name as the first item above. 如果要通过7zip提取档案,我会看到一个目录,名称与上面的第一项相同。 So, clearly, the extractall() method is screwing up, but I don't know how to fix this.
因此,很明显,extractall()方法正在搞砸,但我不知道该如何解决。
I learned that tar doesn't retain the encoding information as part of the archive and treats filenames as raw byte sequences. 我了解到tar不会将编码信息保留为存档的一部分,而是将文件名视为原始字节序列。 So, the output I saw from
tarfile.extractall()
was simply raw the character sequence that comprised the file's name prior to compression. 因此,我从
tarfile.extractall()
看到的输出只是原始字符序列,该字符序列包含压缩前文件的名称。 In order to get the extractall()
method to recreate the original filenames, I discovered that you have to manually convert the members
of the TarFile
object to the appropriate encoding before calling extractall()
. 为了获得
extractall()
方法来重新创建原始文件名,我发现您必须在调用TarFile
extractall()
之前手动将TarFile
对象的members
转换为适当的编码。 In my case, the following did the trick: 就我而言,以下方法可以解决问题:
modeltar = tarfile.open(zippath, mode="r:gz")
updatedMembers = []
for m in modeltar.getmembers():
m.name = unicode(m.name, 'utf-8')
updatedMembers.append(m)
modeltar.extractall(members=updatedMembers, path=dbpath)
The above code is based on this superuser answer: https://superuser.com/a/190786/354642 上面的代码基于以下超级用户答案: https : //superuser.com/a/190786/354642
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.