[英]I'm trying to encode csv file to utf8 using python
I'm using python to read and encode many files to utf8 using python,I try it with the code below: 我正在使用python使用python读取许多文件并将其编码为utf8,请尝试以下代码:
import os
from os import listdir
def find_csv_filenames(path_to_dir, suffix=".csv" ):
path_to_dir = os.path.normpath(path_to_dir)
filenames = listdir(path_to_dir)
#Check *csv directory
fp = lambda f: not os.path.isdir(path_to_dir+"/"+f) and f.endswith(suffix)
return [path_to_dir+"/"+fname for fname in filenames if fp(fname)]
def convert_files(files, ascii, to="utf-8"):
count = 0
lineno = 0
for name in files:
lineno = lineno+1
with open(name) as f:
file_target = open(name, mode='r', encoding='latin-1')
file_content = file_target.read()
file_target.close
print(lineno)
file_source = open("./csv/data{}.csv".format(lineno), mode='w', encoding='utf-8')
file_source.write(file_content)
csv_files = find_csv_filenames('./csv', ".csv")
convert_files(csv_files, "cp866")
The problem is that after I read and write data to other files and set encode it to utf8 but it still not work. 问题是,在我将数据读写到其他文件并将其编码设置为utf8之后,它仍然无法正常工作。
Before you open a file which encoding is not clear, you could use chardet to detect the file's encoding rather than use a encoding guessed to open a file. 在打开编码不清楚的文件之前,可以使用chardet检测文件的编码,而不是使用猜测的编码来打开文件。 Usage is like this: 用法是这样的:
>>> import chardet
>>> encoding = chardet.detect('PATH/TO/FILE')['encoding']
And then open the file with the encoding detected and write the contents into a file opened with 'utf-8' encoding. 然后使用检测到的编码打开文件,然后将内容写入以“ utf-8”编码打开的文件。
If you're not sure whether the file is converted using 'utf-8' encoding, you could use enca to see if the encoding of the file is 'ASCII' or 'utf-8' like this in Linux shell: 如果不确定文件是否使用'utf-8'编码进行转换,则可以使用enca来查看文件的编码是'ASCII'还是'utf-8',例如在Linux shell中:
$ enca FILENAME
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.