简体   繁体   English

在 python 目录中打开文件,编码有问题

[英]Opening files in a directory with python, trouble with encoding

import os
listing = os.listdir(path)
for infile in listing:
    print infile
    f = open(os.path.join(path, infile), 'r')

I have made a script in python that iterates through all files in a directory and opens them.我在 python 中制作了一个脚本,它遍历目录中的所有文件并打开它们。 It works ok, the problem arises with the names of some files.它工作正常,问题出现在某些文件的名称上。 The name of the file is Trade_Map_-_List_of_products_exported_by_Côte_d'Ivoire, but when its tries to open it cant I get this error文件的名称是 Trade_Map_-_List_of_products_exported_by_Côte_d'Ivoire,但是当它试图打开它时我无法得到这个错误

IOError: [Errno 2] No such file or directory: "C:\\Users\\Borut\\Downloads\\GC downloads\\izvoz\\Trade_Map_-_List_of_products_exported_by_Co^te_d'Ivoire.txt"

The real name has Côte_d'Ivoire in the end, while the name I get when I iterate through listdir has Co^te_d'Ivoire in the end.真名最后有 Côte_d'Ivoire,而我在遍历 listdir 时得到的名字最后有 Co^te_d'Ivoire。 What is wrong??怎么了??

The encoding of os.listdir(path) depends on the encoding of the string path . os.listdir(path)的编码取决于字符串path的编码。 If path is unicode, then the list of entries returned by os.listdir(path) will be unicode. Otherwise, the returned list will use the system default encoding.如果path为 unicode,则os.listdir(path)返回的条目列表将为 unicode。否则,返回的列表将使用系统默认编码。 If you want to be sure to output your list of file correctly, you could try the following (untested):如果你想确保 output 你的文件列表正确,你可以尝试以下(未经测试):

import os
import sys

path = unicode(path, sys.getfilesystemencoding())

# All elements of listing will be in unicode.
listing = os.listdir(path)
for infile in listing:
    print infile

    # When infile is in unicode, the system to open 
    # the file using the correct encoding for the filename
    f = open(os.path.join(path, infile), 'r')

sys.getfilesystemencoding() is a method to get your system default encoding, which is how open and other methods expect their string inputs to be in (even though unicode is also fine, as they convert them automatically to the default encoding). sys.getfilesystemencoding()是一种获取系统默认编码的方法,这是open和其他方法期望其字符串输入的方式(即使 unicode 也很好,因为它们会自动将它们转换为默认编码)。

Reference: http://docs.python.org/howto/unicode.html#unicode-filenames参考: http://docs.python.org/howto/unicode.html#unicode-filenames

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM