简体   繁体   English

在csv文件中写入数据时发生python错误

[英]python error when writing data in csv file

write a python program to write data in .csv file,but find that every item in the .csv has a "b'" before the content, and there are blank line, I do not know how to remove the blank lines; 编写一个python程序以将数据写入.csv文件中,但是发现.csv中的每个项目的内容前都有一个“ b'”,并且有空行,我不知道如何删除空行; and some item in the .csv file are unrecognizable characters,such as "b'\\xe7\\xbe\\x85\\xe5\\xb0\\x91\\xe5\\x90\\x9b'", because some data are in Chinese and Japanese, so I think maybe something wrong when writing these data in the .csv file.Please help me to solve the problem the program is: .csv文件中的某些项目是无法识别的字符,例如“ b'\\ xe7 \\ xbe \\ x85 \\ xe5 \\ xb0 \\ x91 \\ xe5 \\ x90 \\ x9b'”,因为某些数据是中文和日文的,所以我认为将这些数据写入.csv文件时可能出了点问题。请帮助我解决该程序的问题:

#write data in .csv file
def data_save_csv(type,data,id_name,header,since = None):
    #get the date when storage data
    date_storage()
    #create the data storage directory
    csv_parent_directory = os.path.join("dataset","csv",type,glovar.date)
    directory_create(csv_parent_directory)
    #write data in .csv
    if type == "group_members":
        csv_file_prefix = "gm"
    if since:
        csv_file_name = csv_file_prefix + "_" + since.strftime("%Y%m%d-%H%M%S") + "_" + time_storage() + id_name + ".csv"
    else:
        csv_file_name = csv_file_prefix + "_"  + time_storage() + "_" + id_name + ".csv"
    csv_file_directory = os.path.join(csv_parent_directory,csv_file_name)

    with open(csv_file_directory,'w') as csvfile:

        writer = csv.writer(csvfile,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)

        #csv header
        writer.writerow(header)

        row = []
        for i in range(len(data)):
            for k in data[i].keys():
                row.append(str(data[i][k]).encode("utf-8"))
            writer.writerow(row)
            row = []

the .csv file .csv文件

You have a couple of problems. 你有几个问题。 The funky "b" thing happens because csv will cast data to a string before adding it to a column. 发生时髦的“ b”事件是因为csv在将数据添加到列之前会将数据转换为字符串。 When you did str(data[i][k]).encode("utf-8") , you got a bytes object and its string representation is b"..." and its filled with utf-8 encoded data. 当您执行str(data[i][k]).encode("utf-8") ,您获得了一个bytes对象,其字符串表示形式为b"..."并用utf-8编码数据填充。 You should handle encoding when you open the file. 打开文件时,您应该处理编码。 In python 3, open opens a file with the encoding from sys.getdefaultencoding() but its a good idea to be explicit about what you want to write. 在python 3中, open使用sys.getdefaultencoding()的编码打开一个文件,但是明确指出要编写的内容是个好主意。

Next, there's nothing that says that two dicts will enumerate their keys in the same order. 接下来,没有什么可以说两个字典将以相同顺序枚举其键。 The csv.DictWriter class is built to pull data from dictionaries, so use it instead. csv.DictWriter类的构建是为了从字典中提取数据,因此请改用它。 In my example I assumed that header has the names of the keys you want. 在我的示例中,我假设header具有所需键的名称。 It could be that header is different, and in that case, you'll also need to pass in the actual dict key names you want. 可能是header不同,在这种情况下,您还需要传递所需的实际dict键名称。

Finally, you can just strip out empty dicts while you are writing the rows. 最后,您可以在编写行时删除空字典。

with open(csv_file_directory,'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=',',
        quotechar='"',quoting=csv.QUOTE_MINIMAL)
    writer.writeheader()
    writer.writerows(d for d in data if d)

It sounds like at least some of your issues have to do with incorrect unicode. 听起来至少您的某些问题与不正确的unicode有关。

try implementing the snippet below into your existing code. 尝试在您现有的代码中实现以下代码段。 As the comment say, the first part takes your input and converts it into utf-8. 就像评论说的那样,第一部分接受您的输入并将其转换为utf-8。

The second bit will return your output in the expected format of ascii. 第二位将以预期的ascii格式返回您的输出。

import codecs
import unicodedata

f = codecs.open('path/to/textfile.txt', mode='r',encoding='utf-8') #Take input and turn into unicode
    for line in f.readlines():
    line = unicodedata.normalize('NFKD', line).encode('ascii', 'ignore'). #Ensure output is in ASCII

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM