Python无法打开路径中包含非英文字符的文件

Question

I have a file with the following path : D:/bar/クレイジー・ヒッツ！/foo.abc我有一个文件路径如下：D:/bar/クレイジー・ヒッツ！/foo.abc

I am parsing the path from a XML file and storing it in a variable called path in the form of file://localhost/D:/bar/クレイジー・ヒッツ！/foo.abc Then, the following operations are being done :我正在解析 XML 文件中的path ，并将其以file://localhost/D:/bar/クレイジー・ヒッツ！/foo.abc的形式存储在名为path的变量中，然后，正在执行以下操作：

path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.url2pathname(path)
path=urllib.unquote(path)

The error is :错误是：

IOError: [Errno 2] No such file or directory: 'D:\\bar\\\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81\\foo.abc'

Update 1 : I am using Python 2.7 on Windows 7更新 1：我在 Windows 7 上使用 Python 2.7

Answer 1

The path in your error is:你的错误路径是：

'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'

I think this is the UTF8 encoded version of your filename.我认为这是您的文件名的 UTF8 编码版本。

I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:我在 Windows7 上创建了一个同名文件夹，并在其中放置了一个名为“abc.txt”的文件：

>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>> 
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']

So it seems that Duncan's suggestion of path.decode('utf8') does the trick.所以看起来邓肯的path.decode('utf8')建议可以path.decode('utf8') 。

Update更新

I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8') .我无法为您测试，但我建议您在执行.decode('utf8')之前尝试检查路径是否包含非 ascii。 This is a bit hacky...这有点hacky...

ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
  path = path.decode('utf8')
path=urllib.url2pathname(path)

Answer 2

Provide the filename as a unicode string to the open call.将文件名作为unicode字符串提供给open调用。

How do you produce the filename?你如何产生文件名？

if provided as a constant by you如果您作为常量提供

Add a line near the beginning of your script:在脚本开头附近添加一行：

# -*- coding: utf8 -*-

Then, in a UTF-8 capable editor, set path to the unicode filename:然后，在支持 UTF-8 的编辑器中，设置unicode文件名的path ：

path = u"D:/bar/クレイジー・ヒッツ！/foo.abc"

read from a list of directory contents从目录内容列表中读取

Retrieve the contents of the directory using a unicode dirspec:使用unicode dirspec 检索目录的内容：

dir_files= os.listdir(u'.')

read from a text file从文本文件中读取

Open the filename-containing-file using codecs.open to read unicode data from it.使用codecs.open打开包含文件名的文件以codecs.open读取unicode数据。 You need to specify the encoding of the file (because you know what is the “default windows charset” for non-Unicode applications on your computer).您需要指定文件的编码（因为您知道计算机上非 Unicode 应用程序的“默认 Windows 字符集”是什么）。

in any case任何状况之下

Do a:做一个：

path= path.decode("utf8")

before opening the file;在打开文件之前； substitute the correct encoding if not "utf8".如果不是“utf8”，则替换正确的编码。

Answer 3

Here's some interesting stuff from the documentation :以下是文档中的一些有趣内容：

sys.getfilesystemencoding() sys.getfilesystemencoding()

Return the name of the encoding used to convert Unicode filenames into system file names, or None if the system default encoding is used.返回用于将 Unicode 文件名转换为系统文件名的编码名称，如果使用系统默认编码，则返回 None 。 The result value depends on the operating system: On Mac OS X, the encoding is 'utf-8'.结果值取决于操作系统：在 Mac OS X 上，编码为“utf-8”。 On Unix, the encoding is the user's preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed.在 Unix 上，根据 nl_langinfo(CODESET) 的结果，编码是用户的偏好，如果 nl_langinfo(CODESET) 失败，则编码为 None。 On Windows NT+, file names are Unicode natively, so no conversion is performed.在 Windows NT+ 上，文件名本机是 Unicode，因此不执行转换。 getfilesystemencoding() still returns 'mbcs', as this is the encoding that applications should use when they explicitly want to convert Unicode strings to byte strings that are equivalent when used as file names. getfilesystemencoding() 仍然返回 'mbcs'，因为这是应用程序在明确希望将 Unicode 字符串转换为用作文件名时等效的字节字符串时应使用的编码。 On Windows 9x, the encoding is 'mbcs'.在 Windows 9x 上，编码是“mbcs”。

New in version 2.3. 2.3 版中的新功能。

If I understand this correctly, you should pass the file name as unicode:如果我理解正确，您应该将文件名作为 unicode 传递：

f = open(unicode(path, encoding))

Python无法打开路径中包含非英文字符的文件

问题描述

3 个解决方案

解决方案1
6 已采纳 2011-05-12 09:27:08

解决方案2
2 2011-05-12 10:11:37

if provided as a constant by you如果您作为常量提供

read from a list of directory contents从目录内容列表中读取

read from a text file从文本文件中读取

in any case任何状况之下

解决方案3
1 2011-05-12 08:34:58

Python无法打开路径中包含非英文字符的文件

问题描述

3 个解决方案

解决方案1 6 已采纳 2011-05-12 09:27:08

解决方案2 2 2011-05-12 10:11:37

if provided as a constant by you如果您作为常量提供

read from a list of directory contents从目录内容列表中读取

read from a text file从文本文件中读取

in any case任何状况之下

解决方案3 1 2011-05-12 08:34:58

解决方案1
6 已采纳 2011-05-12 09:27:08

解决方案2
2 2011-05-12 10:11:37

解决方案3
1 2011-05-12 08:34:58