简体   繁体   English

Python:附加/合并多个关于头文件的csv文件并写入csv

[英]Python: appending/merging multiple csv files respecting headers and write to csv

[Using Python3] I'm very new to (Python) programming but nonetheless am writing a script that scans a folder for certain csv files, then I want to read them all and append them and write them into another csv file. [使用Python3]我对(Python)编程很新,但是我正在编写一个脚本来扫描文件夹中的某些csv文件,然后我想要全部读取它们并附加它们并将它们写入另一个csv文件。

In between it is required that data is returned only where the values in a certain columns are matched to a set criteria. 在两者之间,只需要在某些列中的值与设定标准匹配的情况下返回数据。

All csv files have the same columns, and would look somewhere like this: 所有csv文件都有相同的列,看起来像这样:

header1 header2 header3 header4 ...
string  float   string  float   ...
string  float   string  float   ...
string  float   string  float   ...
string  float   string  float   ...
...     ...     ...     ...     ...

The code I'm working with right now is the following (below), however it just keeps on overwriting the data from the previous file. 我正在使用的代码如下(下面),但它只是继续覆盖前一个文件中的数据。 That does make sense to me, I just cannot figure out how to get it working though. 这对我来说有意义,我只是无法弄清楚如何让它工作。

Code: 码:

import csv
import datetime
import sys
import glob
import itertools
from collections import defaultdict

# Raw data files have the format like '2013-06-04'. To be able to use this script during the whole of 2013, the glob is set to search for the pattern '2013-*.csv'
files = [f for f in glob.glob('2013-*.csv')]

# Output file looks like '20130620-filtered.csv'
outfile = '{:%Y%m%d}-filtered.csv'.format(datetime.datetime.now())

# List of 'Header4' values to be filtered for writing output
header4 = ['string1', 'string2', 'string3', 'string4']

for f in files:
    with open(f, 'r') as f_in:
        dict_reader = csv.DictReader(f_in)

        with open(outfile, 'w') as f_out:
            dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
            dict_writer.writeheader()
            for row in dict_reader:
                if row['Campaign'] in campaign_names:
                    dict_writer.writerow(row)

I also tried something like readers = list(itertools.chain(*map(lambda f: csv.DictReader(open(f)), files))) , and trying to iterate over the readers however then I cannot figure out how to work with the headers. 我也试过像readers = list(itertools.chain(*map(lambda f: csv.DictReader(open(f)), files))) ,并尝试迭代读者然后我无法弄清楚如何工作标题。 (I get the error that itertools.chain() does not have the fieldnames attribute). (我得到的错误是itertools.chain()没有fieldnames属性)。

Any help is very much appreciated! 很感谢任何形式的帮助!

You keep re-opening the file and overwriting it. 您不断重新打开文件并覆盖它。

Open outfile once, before your loops start. 在循环开始之前打开outfile一次。 For the first file you read, write the header and the rows. 对于您阅读的第一个文件,请编写标题和行。 For rest of the files, just write the rows. 对于其余文件,只需写入行。

Something like 就像是

with open(outfile, 'w') as f_out:
    dict_writer = None
    for f in files:
        with open(f, 'r') as f_in:
            dict_reader = csv.DictReader(f_in)
            if not dict_writer:
                dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
                dict_writer.writeheader()
            for row in dict_reader:
                if row['Campaign'] in campaign_names:
                    dict_writer.writerow(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM