简体   繁体   English

使用python格式化csv文件

[英]Formatting csv file with python

I have a csv file with the following structure: 我有一个具有以下结构的csv文件:

"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"

I need him to stay like this: 我需要他留下来像这样:

"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"

I received this .csv file from someone else, so I do not know how the conversion was done. 我从别人那里收到了这个.csv文件,所以我不知道转换是如何完成的。 I am trying unsuccessfully with the code below: 我尝试使用下面的代码失败:

input_fd = open("/home/gustavo/Downloads/Redes/Despesas/csvfile.csv", 'r')
output_fd = open('dados_2018_1.csv', 'w')
for line in input_fd.readlines():
    line.replace("\"","")
    output_fd.write(line)
    input_fd.close()
output_fd.close()

Is it possible to make this change or will I have to do the conversion from an xml file to a csv, and make this change during the conversion? 是否可以进行此更改,还是我必须将文件从xml文件转换为csv,并在转换过程中进行此更改?

A couple things. 几件事。 First, you do NOT have a csv file because in a csv file, the delimiter is a comma by definition. 首先,您没有csv文件,因为在csv文件中,定界符在定义上是逗号。 I'm assuming you want the values in your data file to (1) remain separated by semicolons [why not fix it and make it commas?] and (2) you want each value to be in quotation marks. 我假设您希望数据文件中的值(1)保持用分号分隔[为什么不修复它并使之成为逗号?]和(2)您希望每个值都用引号引起来。

If so, I think this will work: 如果是这样,我认为这会起作用:

# data reader

in_file = 'data.txt'
out_file = 'fixed.txt'
output = open(out_file, 'w')
with open(in_file, 'r') as source:
    for line in source:
        # split by semicolon
        data = line.strip().split(';')             
        # remove all quotes found
        data = [t.replace('"','') for t in data]   
        for item in data[:-1]:
            output.write(''.join(['"', item, '"',';']))
        # write the last item separately, without the trailing ';'
        output.write(''.join(['"', item, '"']))
        output.write('\n')
output.close()

If your target user is python, you should consider replacing the semicolons with commas (correct csv format) and forgoing the quotes. 如果您的目标用户是python,则应考虑用逗号(正确的csv格式)替换分号并放弃引号。 Everything python reads from csv is taken in as string anyhow. python从csv读取的所有内容都以字符串形式接收。

Using csv module. 使用csv模块。

Ex: 例如:

import csv

with open(filename) as csvfile:
    reader = csv.reader(csvfile, delimiter=";")
    headers = next(reader)    #Read Headers
    data = [row.strip('"').split(";") for row in csvfile]    #Format data

with open(filename, "w") as csvfile_out:
    writer = csv.writer(csvfile_out, delimiter=";")
    writer.writerow(headers)   #Write Headers
    writer.writerows(data)     #Write data

First: tell the reader to use delimiter=";" 第一:告诉reader使用delimiter=";" and quoting=csv.QUOTE_NONE . quoting=csv.QUOTE_NONE This will properly split your second line which is a string literal containing your delimiter, which you desire to be split. 这将正确分割第二行,这是包含分隔符的字符串文字,您希望将其分隔。 We'll tweak that data to remove the quotation marks (otherwise our output will be quoted strings like '"txNomeParlamentar"' , etc). 我们将调整该数据以除去引号(否则,我们的输出将使用诸如'"txNomeParlamentar"'等的带引号的字符串)。

import csv
with open('file.txt') as f:
     reader = csv.reader(f, delimiter=";", quoting=csv.QUOTE_NONE)
     data = [list(map(lambda s: s.replace('"', ''), row)) for row in reader]

Then: we write the file back out, with the delimiter=";" 然后:我们用delimiter=";"写回文件delimiter=";" , and quoting=csv.QUOTE_ALL to ensure each item is set in quotes ,并用quoting=csv.QUOTE_ALL来确保每个项目都用引号引起来

with open('out.txt', 'w', newline='') as o:
     writer = csv.writer(o, delimiter=";", quoting=csv.QUOTE_ALL)
     writer.writerows(data)

Input: 输入:

"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"

在此处输入图片说明

Output: 输出:

"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"

在此处输入图片说明

You could use the csv module to do it if you massage the input data a little first. 如果先按摩一下输入数据,则可以使用csv模块来执行此操作。

import csv


#input_csv = '/home/gustavo/Downloads/Redes/Despesas/csvfile.csv'
input_csv = 'gustavo_input.csv'
output_csv = 'dados_2018_1.csv'

with open(input_csv, 'r', newline='') as input_fd, \
     open(output_csv, 'w', newline='') as output_fd:

    reader = csv.DictReader(input_fd, delimiter=';')
    writer = csv.DictWriter(output_fd, delimiter=';',
                            fieldnames=reader.fieldnames,
                            quoting=csv.QUOTE_ALL)

    first_field = reader.fieldnames[0]
    for row in reader:
        fields = row[first_field].split(';')
        newrow = dict(zip(reader.fieldnames, fields))
        writer.writerow(newrow)

print('done')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM