简体   繁体   English

Python:如何从 csv 文件的第二行开始迭代每第三行

[英]Python: How to iterate every third row starting with the second row of a csv file

I'm trying to write a program that iterates through the length of a csv file row by row.我正在尝试编写一个程序,它逐行遍历 csv 文件的长度。 It will create 3 new csv files and write data from the source csv file to each of them.它将创建 3 个新的 csv 文件并将数据从源 csv 文件写入每个文件。 The program does this for the entire row length of the csv file.该程序对 csv 文件的整个行长度执行此操作。

For the first if statement, I want it to copy every third row starting at the first row and save it to a new csv file(the next row it copies would be row 4, row 7, row 10, etc)对于第一个 if 语句,我希望它从第一行开始每三行复制一次并将其保存到一个新的 csv 文件中(它复制的下一行将是第 4 行、第 7 行、第 10 行等)

For the second if statement, I want it to copy every third row starting at the second row and save it to a new csv file(the next row it copies would be row 5, row 8, row 11, etc).对于第二个 if 语句,我希望它从第二行开始每三行复制一次并将其保存到一个新的 csv 文件(它复制的下一行将是第 5 行、第 8 行、第 11 行等)。

For the third if statement, I want it to copy every third row starting at the third row and save it to a new csv file(the next row it copies would be row 6, row 9, row 12, etc).对于第三个 if 语句,我希望它从第三行开始复制每第三行并将其保存到一个新的 csv 文件(它复制的下一行将是第 6 行、第 9 行、第 12 行等)。

The second "if" statement I wrote that creates the first "agentList1.csv" works exactly the way I want it to but I can't figure out how to get the first "elif" statement to start from the second row and the second "elif" statement to start from the third row.我编写的第二个“if”语句创建了第一个“agentList1.csv”,它的工作方式与我想要的完全一样,但我不知道如何让第一个“elif”语句从第二行和第二行开始“elif”语句从第三行开始。 Any help would be much appreciated!任何帮助将非常感激!

Here's my code:这是我的代码:

for index, row in Sourcedataframe.iterrows(): #going through each row line by line

#this for loop counts the amount of times it has gone through the csv file. If it has gone through it more than three times, it resets the counter back to 1.
for column in Sourcedataframe: 
    if count > 3:
        count = 1

        #if program is on it's first count, it opens the 'Sourcedataframe', reads/writes every third row to a new csv file named 'agentList1.csv'.
    if count == 1:
        with open('blankAgentList.csv') as infile: 

          with open('agentList1.csv', 'w') as outfile:
            reader = csv.DictReader(infile)
            writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
            writer.writeheader()
            for row in reader:
                count2 += 1
                if not count2 % 3:
                    writer.writerow(row)

    elif count == 2:
        with open('blankAgentList.csv') as infile:

          with open('agentList2.csv', 'w') as outfile:
            reader = csv.DictReader(infile)
            writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
            writer.writeheader()
            for row in reader:
                count2 += 1
                if not count2 % 3:
                    writer.writerow(row)

    elif count == 3:
        with open('blankAgentList.csv') as infile:

          with open('agentList3.csv', 'w') as outfile:
            reader = csv.DictReader(infile)
            writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
            writer.writeheader()
            for row in reader:
                count2 += 1
                if not count2 % 3:
                    writer.writerow(row)

    count = count + 1 #counts how many times it has ran through the main for loop. 

convert csv to dataframe as (df.to_csv(header=True)) to start indexing from second row将 csv 转换为数据帧(df.to_csv(header=True))以从第二行开始索引

then,pass row/record no in iloc function to fetch particular record using ( df.iloc[ 3 , : ])然后,在iloc函数中传递行/记录号以使用( df.iloc[ 3 , : ])获取特定记录

you are open your csv file in each if claus from the beginning.您从一开始就在每个 if 子句中打开您的 csv 文件。 I believe you already opened your file into Sourcedataframe .我相信您已经将文件打开到Sourcedataframe so just get rid of reader = csv.DictReader(infile) and read data like this:所以只需摆脱reader = csv.DictReader(infile)并像这样读取数据:

Sourcedataframe.iloc[column]

Using plain python we can create a solution that works for any number of interleaved data rows, let's call it NUM_ROWS, not just three.使用普通的 Python,我们可以创建一个适用于任意数量的交错数据行的解决方案,我们称之为 NUM_ROWS,而不仅仅是三个。

Nota Bene: the solution does not require to read and keep the whole input all the data in memory. Nota Bene:该解决方案不需要读取整个输入的所有数据并将其保存在内存中。 It processes one line at a time, grouping the last needed few and works fine for a very large input file.它一次处理一行,将最后需要的几行分组,并且适用于非常大的输入文件。

Assuming your input file contains a number of data rows which is a multiple of NUM_ROWS, ie the rows can be split evenly to the output files:假设您的输入文件包含许多数据行,它们是 NUM_ROWS 的倍数,即这些行可以均匀地拆分为输出文件:

NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]

with open('blankAgentList.csv') as infile:
    header = infile.readline() # read/skip the header

    for f in outfiles: # repeat header in all output files if needed
        f.write(header)

    row_groups = zip(*[iter(infile)]*NUM_ROWS)
    for rg in row_groups:
        for f, r in zip(outfiles, rg):
            f.write(r)

for f in outfiles:
    f.close()

Otherwise, for any number of data rows we can use否则,对于我们可以使用的任意数量的数据行

import itertools as it

NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]

with open('blankAgentList.csv') as infile:
    header = infile.readline() # read/skip the header

    for f in outfiles: # repeat header in all output files if needed
        f.write(header)

    row_groups = it.zip_longest(*[iter(infile)]*NUM_ROWS)
    for rg in row_groups:
        for f, r in it.zip_longest(outfiles, rg):
            if r is None:
                break
            f.write(r)

for f in outfiles:
    f.close()

which, for example, with an input file of例如,输入文件为

A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c

produces (output copied straight from the terminal)产生(直接从终端复制的输出)

(base) SO $ cat blankAgentList.csv 
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c

(base) SO $ cat blankAgentList1.csv 
A,B,C
r1a,r1b,r1c
r4a,r4b,r4c
r7a,r7b,r7c

(base) SO $ cat blankAgentList2.csv 
A,B,C
r2a,r2a,r2c
r5a,r5b,r5c

(base) SO $ cat blankAgentList3.csv 
A,B,C
r3a,r3b,r3c
r6a,r6b,r6c

Note: I understand the line注意:我理解这条线

row_groups = zip(*[iter(infile)]*NUM_ROWS)

may be intimidating at first (it was for me when I started).一开始可能会令人生畏(我刚开始时是这样的)。

All it does is simply to group consecutive lines from the input file.它所做的只是对输入文件中的连续行进行分组。

If your objective includes learning Python, I recommend studying it thoroughly via a book or a course or both and practising a lot.如果您的目标包括学习 Python,我建议您通过一本书或一门课程或两者都进行彻底的学习,并进行大量练习。

One key subject is the iteration protocol, along with all the other protocols.一个关键主题是迭代协议,以及所有其他协议。 And namespaces.和命名空间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM