简体   繁体   English

读入一个文件夹中的csv个文件和output到一个单独的csv

[英]Read in csv files in a folder and output to one single csv

I have a folder with many csv files.我有一个包含许多 csv 个文件的文件夹。 I want to read them in and depending on certain criteria output the records to specific output files.我想阅读它们并根据某些标准 output 将记录记录到特定的 output 文件。 So in my case I have 3 different output files.所以就我而言,我有 3 个不同的 output 文件。

So I have many of csv files.所以我有很多 csv 个文件。 Let's consider one file looking like:让我们考虑一个看起来像这样的文件:

Column1;Column2
90ABCDE;AB
80BDESD;CD

And another looking like:另一个看起来像:

Column1;Column2
80ABCDE;AB
80BDESD;CD
80BCCDE;AB
70BDESD;CD

Each csv file has an header. The header is always the same.每个 csv 文件都有一个 header。header 始终相同。 In the final csv files I would like to have the header once in the beginning, but not in between the data.在最后的 csv 文件中,我希望在开始时有一次 header,但不在数据之间。

I want to have one file where every record is stored.我想要一个文件来存储每条记录。 In another I would like to have only those records where Column1 begins with '80B'.在另一个中,我只想拥有 Column1 以“80B”开头的那些记录。 In the third file I would like to have those records where Column1 does not begin with '80B' and the fourth character is not equal to 'D'.在第三个文件中,我想要那些 Column1 不以“80B”开头且第四个字符不等于“D”的记录。

So the output should be:所以 output 应该是:

file 'all.csv'文件“all.csv”

Column1;Column2
90ABCDE;AB
80BDESD;CD
80ABCDE;AB
80BDESD;CD
80BCCDE;AB
70BDESD;CD

file 'subset_1'文件“subset_1”

Column1;Column2
80BDESD;CD
80BDESD;CD
80BCCDE;AB

file 'subset_2'文件“subset_2”

Column1;Column2
80BCCDE;AB

I tried the following code:我尝试了以下代码:

import glob
import csv
import os


path = r'C:\myfolder\test'

all_files=glob.glob(os.path.join(path, "*.csv"))

with open(r'C:\myfolder\all.csv', "w", newline='') as dall, \
open(r'C:\myfolder\subset_1.csv', "w", newline='') as \
subset_1, open(r'C:\myfolder\subset_2.csv', "w", newline='') as subset_2:
    
    cw_all = csv.writer(dall, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    cw_subset_1 = csv.writer(subset_1, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    cw_subset_2 = csv.writer(subset_2, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    
    cw_all.writerow(['Column1','Column2'])
    cw_subset_1.writerow(['Column1','Column2'])
    cw_subset_2.writerow(['Column1','Column2'])
    
    for filename in all_files:
        with open(filename) as infile:
            cr = csv.reader(infile, delimiter=";")
            #next(cr)
            for line in cr:
                cw_all.writerow(line)
            if (
                (line[0][:3] !="80B")
                ): cw_subset_1.writerow(line)
            if (
                (line[0][:3] =="80B") and
                (line[0][3:4] =="D")
                ): cw_subset_2.writerow(line)

For the first try I also ignored the problem with the header and commented out the next(cr).对于第一次尝试,我也忽略了 header 的问题并注释掉了 next(cr)。 But it is not working.但它不起作用。 Somehow the records are not properly stored into the corresponding files.不知何故,记录没有正确存储到相应的文件中。 The line pointer is not putting each record properly into the files.行指针没有将每条记录正确地放入文件中。 Where is my mistake?我的错误在哪里?

I would like to do it on a csv level.我想在 csv 级别上进行。 Without pandas.没有 pandas。

(I want to write it "on the fly" while reading the files, so I do not want to first create a large file with everything, then read this once to create the first subset and then read the large file a second time to create the second subset. This is quite inefficient as I have to read the large file several times.) (我想在读取文件时“即时”写入它,所以我不想先创建一个包含所有内容的大文件,然后读取一次以创建第一个子集,然后再次读取大文件以创建第二个子集。这是非常低效的,因为我必须多次读取大文件。)

There are three problems I see:我看到三个问题:

  1. Uncomment next(cr) so the headers aren't copied into the new files.取消注释next(cr) ,这样标题就不会复制到新文件中。
  2. The if statements should be indented under the for line in cr: line. if语句应该for line in cr:下缩进。
  3. line[0][3:4] == "D" should be be line[0][3:4] != "D" . line[0][3:4] == "D"应该是line[0][3:4] != "D"

Note that line[0][3:4] != "D" can be just line[0][3] != "D" when checking a single character in a string.请注意,当检查字符串中的单个字符时, line[0][3:4] != "D"可以只是line[0][3] != "D"

You description of the 3rd file does not match the desired output. I went with the description below.您对第三个文件的描述与所需的 output 不匹配。我按照下面的描述进行操作。 Comments are from the OP requirements.评论来自 OP 要求。

for filename in all_files:
    with open(filename) as infile:
        cr = csv.reader(infile, delimiter=';')
        next(cr)  # skip the header in each input file
        for line in cr:
            # one file where every record is stored.
            cw_all.writerow(line)
            # only those records where Column1 begins with '80B'.
            if line[0][:3] == '80B':
                cw_subset_1.writerow(line)
            # those records where Column1 does not begin with '80B'
            # and the fourth character is not equal to 'D'.            
            if line[0][:3] != '80B' and line[0][3] != 'D':
                cw_subset_2.writerow(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将多个csv文件读入熊猫并在一个csv文件中输出 - How to read multiple csv files into pandas and output in one csv file 读取一个文件夹中的多个 csv 文件 - Read multiple csv files in a folder 读取文件夹中的特定 CSV 文件? - Read specific CSV files in a folder? Python:将相同的.csv文件从各个文件夹(每个文件夹有一个.csv文件)复制到一个文件夹中 - Python: Copy identical .csv files from various folders (each folder has one .csv file) into a single folder 使用 python,如何读取文件夹中的所有 CSV 文件并将其内容写入新的单个 CSV 文件? - Using python, how to read all the CSV files in a folder and write the content of the same to a new single CSV file? 如何遍历 csv 个文件的文件夹并读取每个文件的 header? 然后 output 在文件夹中 - how to loop through a folder of csv files and read header of each? then output in a folder 将分区的 csv 文件写入单个文件夹 - Pyspark - Write paritioned csv files to a single folder - Pyspark 如何将文件夹中的所有csv文件合并到列上的单个csv? - How to merge all csv files in a folder to single csv ased on columns? 如何将文件夹中的不同csv文件合并为单个csv文件? - How to merge different csv files in a folder into a single csv file? 如何在python中的文件夹中读取某些csv文件 - How to read some csv files in a folder in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM