简体   繁体   English

读取许多csv文件并将其写入使用python编码的utf8

[英]Read many csv file and write it to encoding to utf8 using python

I'm using python code to read from many csv files and set encoding to utf8.I meet the problem when I read the file I can read all lines but when I write it, it can write only 1 line. 我正在使用python代码从许多csv文件中读取并将编码设置为utf8。我在读取文件时可以读取所有行,但是在编写时只能写入1行,因此遇到了问题。 Please help me to check my code as below: 请帮助我检查我的代码,如下所示:

def convert_files(files, ascii, to="utf-8"):
for name in files:
#print ("Convert {0} from {1} to {2}").format(name, ascii, to)
    with open(name) as f:
        print(name)
        count = 0
        lineno = 0
        #this point I want to write the below text into my each new file at the first line           
        #file_source.write('id;nom;prenom;nom_pere;nom_mere;prenom_pere;prenom_mere;civilite (1=homme 2=f);date_naissance;arrondissement;adresse;ville;code_postal;pays;telephone;email;civilite_demandeur (1=homme 2=f);nom_demandeur;prenom_demandeur;qualite_demandeur;type_acte;nombre_actes\n')
        for line in f.readlines():
            lineno +=1
            if lineno == 1 :
                continue
            file_source = open(name, mode='w', encoding='utf-8', errors='ignore')
            #pass
            #print (line)
            # start write data to to new file with encode

            file_source.write(line)
            #file_source.close

#print unicode(line, "cp866").encode("utf-8")   
csv_files = find_csv_filenames('./csv', ".csv")
convert_files(csv_files, "cp866")  

You're reopening the file during every iteration. 您需要在每次迭代中重新打开文件。

for line in f.readlines():
        lineno +=1
        if lineno == 1 :
            continue
        #move the following line outside of the for block
        file_source = open(name, mode='w', encoding='utf-8', errors='ignore')

If all you need is to change the character encoding of the files then it doesn't matter that they are csv files unless the conversion may change what characters are interpreted as delimiter, quotechar, etc: 如果您只需要更改文件的字符编码,那么它们就是csv文件就没关系,除非转换可能会更改解释为定界符,quotechar等的字符:

def convert(filename, from_encoding, to_encoding):
    with open(filename, newline='', encoding=from_encoding) as file:
        data = file.read().encode(to_encoding)
    with open(filename, 'wb') as outfile:
         outfile.write(data)

for path in csv_files:
    convert(path, "cp866", "utf-8")

Add errors parameter to change how encoding/decoding errors are handled. 添加errors参数以更改编码/解码错误的处理方式。

If files may be large then you could convert data incrementally: 如果文件很大,则可以增量转换数据:

import os
from shutil import copyfileobj
from tempfile import NamedTemporaryFile

def convert(filename, from_encoding, to_encoding):
    with open(filename, newline='', encoding=from_encoding) as file:
        with NamedTemporaryFile('w', encoding=to_encoding, newline='', 
                                dir=os.path.dirname(filename)) as tmpfile:
            copyfileobj(file, tmpfile)
            tmpfile.delete = False
    os.replace(tmpfile.name, filename) # rename tmpfile -> filename

for path in csv_files:
    convert(path, "cp866", "utf-8")

You can do this 你可以这样做

def convert_files(files, ascii, to="utf-8"):
    for name in files:
        with open(name, 'r+') as f:
            data = ''.join(f.readlines())
            data.decode(ascii).encode(to)
            f.seek(0)
            f.write(data)
            f.truncate()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM