[英]Python not able to open file with non-english characters in path
I have a file with the following path : D:/bar/クレイジー・ヒッツ!/foo.abc我有一个文件路径如下:D:/bar/クレイジー・ヒッツ!/foo.abc
I am parsing the path from a XML file and storing it in a variable called path
in the form of file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc
Then, the following operations are being done :我正在解析 XML 文件中的
path
,并将其以file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc
的形式存储在名为path
的变量中,然后,正在执行以下操作:
path=path.strip()
path=path[17:] #to remove the file://localhost/ part
path=urllib.url2pathname(path)
path=urllib.unquote(path)
The error is :错误是:
IOError: [Errno 2] No such file or directory: 'D:\\bar\\\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81\\foo.abc'
Update 1 : I am using Python 2.7 on Windows 7更新 1:我在 Windows 7 上使用 Python 2.7
The path in your error is:你的错误路径是:
'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
I think this is the UTF8 encoded version of your filename.我认为这是您的文件名的 UTF8 编码版本。
I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:我在 Windows7 上创建了一个同名文件夹,并在其中放置了一个名为“abc.txt”的文件:
>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>>
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']
So it seems that Duncan's suggestion of path.decode('utf8')
does the trick.所以看起来邓肯的
path.decode('utf8')
建议可以path.decode('utf8')
。
Update更新
I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8')
.我无法为您测试,但我建议您在执行
.decode('utf8')
之前尝试检查路径是否包含非 ascii。 This is a bit hacky...这有点hacky...
ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/ part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
path = path.decode('utf8')
path=urllib.url2pathname(path)
Provide the filename as a unicode
string to the open
call.将文件名作为
unicode
字符串提供给open
调用。
How do you produce the filename?你如何产生文件名?
Add a line near the beginning of your script:在脚本开头附近添加一行:
# -*- coding: utf8 -*-
Then, in a UTF-8 capable editor, set path
to the unicode
filename:然后,在支持 UTF-8 的编辑器中,设置
unicode
文件名的path
:
path = u"D:/bar/クレイジー・ヒッツ!/foo.abc"
Retrieve the contents of the directory using a unicode
dirspec:使用
unicode
dirspec 检索目录的内容:
dir_files= os.listdir(u'.')
Open the filename-containing-file using codecs.open
to read unicode
data from it.使用
codecs.open
打开包含文件名的文件以codecs.open
读取unicode
数据。 You need to specify the encoding of the file (because you know what is the “default windows charset” for non-Unicode applications on your computer).您需要指定文件的编码(因为您知道计算机上非 Unicode 应用程序的“默认 Windows 字符集”是什么)。
Do a:做一个:
path= path.decode("utf8")
before opening the file;在打开文件之前; substitute the correct encoding if not "utf8".
如果不是“utf8”,则替换正确的编码。
Here's some interesting stuff from the documentation :以下是文档中的一些有趣内容:
sys.getfilesystemencoding()
sys.getfilesystemencoding()
Return the name of the encoding used to convert Unicode filenames into system file names, or None if the system default encoding is used.
返回用于将 Unicode 文件名转换为系统文件名的编码名称,如果使用系统默认编码,则返回 None 。 The result value depends on the operating system: On Mac OS X, the encoding is 'utf-8'.
结果值取决于操作系统:在 Mac OS X 上,编码为“utf-8”。 On Unix, the encoding is the user's preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed.
在 Unix 上,根据 nl_langinfo(CODESET) 的结果,编码是用户的偏好,如果 nl_langinfo(CODESET) 失败,则编码为 None。 On Windows NT+, file names are Unicode natively, so no conversion is performed.
在 Windows NT+ 上,文件名本机是 Unicode,因此不执行转换。 getfilesystemencoding() still returns 'mbcs', as this is the encoding that applications should use when they explicitly want to convert Unicode strings to byte strings that are equivalent when used as file names.
getfilesystemencoding() 仍然返回 'mbcs',因为这是应用程序在明确希望将 Unicode 字符串转换为用作文件名时等效的字节字符串时应使用的编码。 On Windows 9x, the encoding is 'mbcs'.
在 Windows 9x 上,编码是“mbcs”。
New in version 2.3.
2.3 版中的新功能。
If I understand this correctly, you should pass the file name as unicode:如果我理解正确,您应该将文件名作为 unicode 传递:
f = open(unicode(path, encoding))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.