简体   繁体   English

Python无法打开路径中包含非英文字符的文件

[英]Python not able to open file with non-english characters in path

I have a file with the following path : D:/bar/クレイジー・ヒッツ!/foo.abc我有一个文件路径如下:D:/bar/クレイジー・ヒッツ!/foo.abc

I am parsing the path from a XML file and storing it in a variable called path in the form of file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc Then, the following operations are being done :我正在解析 XML 文件中的path ,并将其以file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc的形式存储在名为path的变量中,然后,正在执行以下操作:

path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.url2pathname(path)
path=urllib.unquote(path)

The error is :错误是:

IOError: [Errno 2] No such file or directory: 'D:\\bar\\\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81\\foo.abc'

Update 1 : I am using Python 2.7 on Windows 7更新 1:我在 Windows 7 上使用 Python 2.7

The path in your error is:你的错误路径是:

'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'

I think this is the UTF8 encoded version of your filename.我认为这是您的文件名的 UTF8 编码版本。

I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:我在 Windows7 上创建了一个同名文件夹,并在其中放置了一个名为“abc.txt”的文件:

>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>> 
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']

So it seems that Duncan's suggestion of path.decode('utf8') does the trick.所以看起来邓肯的path.decode('utf8')建议可以path.decode('utf8')


Update更新

I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8') .我无法为您测试,但我建议您在执行.decode('utf8')之前尝试检查路径是否包含非 ascii。 This is a bit hacky...这有点hacky...

ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
  path = path.decode('utf8')
path=urllib.url2pathname(path)

Provide the filename as a unicode string to the open call.将文件名作为unicode字符串提供给open调用。

How do you produce the filename?你如何产生文件名?

if provided as a constant by you如果您作为常量提供

Add a line near the beginning of your script:在脚本开头附近添加一行:

# -*- coding: utf8 -*-

Then, in a UTF-8 capable editor, set path to the unicode filename:然后,在支持 UTF-8 的编辑器中,设置unicode文件名的path

path = u"D:/bar/クレイジー・ヒッツ!/foo.abc"

read from a list of directory contents从目录内容列表中读取

Retrieve the contents of the directory using a unicode dirspec:使用unicode dirspec 检索目录的内容:

dir_files= os.listdir(u'.')

read from a text file从文本文件中读取

Open the filename-containing-file using codecs.open to read unicode data from it.使用codecs.open打开包含文件名的文件以codecs.open读取unicode数据。 You need to specify the encoding of the file (because you know what is the “default windows charset” for non-Unicode applications on your computer).您需要指定文件的编码(因为您知道计算机上非 Unicode 应用程序的“默认 Windows 字符集”是什么)。

in any case任何状况之下

Do a:做一个:

path= path.decode("utf8")

before opening the file;在打开文件之前; substitute the correct encoding if not "utf8".如果不是“utf8”,则替换正确的编码。

Here's some interesting stuff from the documentation :以下是文档中的一些有趣内容:

sys.getfilesystemencoding() sys.getfilesystemencoding()

Return the name of the encoding used to convert Unicode filenames into system file names, or None if the system default encoding is used.返回用于将 Unicode 文件名转换为系统文件名的编码名称,如果使用系统默认编码,则返回 None 。 The result value depends on the operating system: On Mac OS X, the encoding is 'utf-8'.结果值取决于操作系统:在 Mac OS X 上,编码为“utf-8”。 On Unix, the encoding is the user's preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed.在 Unix 上,根据 nl_langinfo(CODESET) 的结果,编码是用户的偏好,如果 nl_langinfo(CODESET) 失败,则编码为 None。 On Windows NT+, file names are Unicode natively, so no conversion is performed.在 Windows NT+ 上,文件名本机是 Unicode,因此不执行转换。 getfilesystemencoding() still returns 'mbcs', as this is the encoding that applications should use when they explicitly want to convert Unicode strings to byte strings that are equivalent when used as file names. getfilesystemencoding() 仍然返回 'mbcs',因为这是应用程序在明确希望将 Unicode 字符串转换为用作文件名时等效的字节字符串时应使用的编码。 On Windows 9x, the encoding is 'mbcs'.在 Windows 9x 上,编码是“mbcs”。

New in version 2.3. 2.3 版中的新功能。

If I understand this correctly, you should pass the file name as unicode:如果我理解正确,您应该将文件名作为 unicode 传递:

f = open(unicode(path, encoding))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用utf 8导出文件以获取Python中的非英语字符吗? - Export file with utf 8 for non-english characters in Python? 使用Python编码-将非英文字符转换为URL - Encoding in Python - non-English characters into a URL 如何编辑代码以能够从 CSV 文件中读取非英文字符 - how to edit the code to be able to read non-English characters from CSV file django / python:python如何编码非英语字符 - django/python: How does python encode non-English characters Python 在文件中发现不存在的字符,用非预期字符替换(非英文字符的编码问题) - Python finds nonexisting character in file, replaces with nonintended character (encoding issue with non-English characters) 有可能在python 2中引发包含非英文字符的异常吗? - possible to raise exception that includes non-english characters in python 2? 在Python3中更正一串非英文字符的长度 - Correct length of a string of non-English characters in Python3 如何在python中显示非英语字符? - How do I display non-english characters in python? 使用python从网站上读写非英语字符 - Reading and writing non-English characters from websites with python Django python发送带有非英语字符的电子邮件 - Django python send email with non-english characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM