简体   繁体   中英

Python: How to iterate every third row starting with the second row of a csv file

I'm trying to write a program that iterates through the length of a csv file row by row. It will create 3 new csv files and write data from the source csv file to each of them. The program does this for the entire row length of the csv file.

For the first if statement, I want it to copy every third row starting at the first row and save it to a new csv file(the next row it copies would be row 4, row 7, row 10, etc)

For the second if statement, I want it to copy every third row starting at the second row and save it to a new csv file(the next row it copies would be row 5, row 8, row 11, etc).

For the third if statement, I want it to copy every third row starting at the third row and save it to a new csv file(the next row it copies would be row 6, row 9, row 12, etc).

The second "if" statement I wrote that creates the first "agentList1.csv" works exactly the way I want it to but I can't figure out how to get the first "elif" statement to start from the second row and the second "elif" statement to start from the third row. Any help would be much appreciated!

Here's my code:

for index, row in Sourcedataframe.iterrows(): #going through each row line by line

#this for loop counts the amount of times it has gone through the csv file. If it has gone through it more than three times, it resets the counter back to 1.
for column in Sourcedataframe: 
    if count > 3:
        count = 1

        #if program is on it's first count, it opens the 'Sourcedataframe', reads/writes every third row to a new csv file named 'agentList1.csv'.
    if count == 1:
        with open('blankAgentList.csv') as infile: 

          with open('agentList1.csv', 'w') as outfile:
            reader = csv.DictReader(infile)
            writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
            writer.writeheader()
            for row in reader:
                count2 += 1
                if not count2 % 3:
                    writer.writerow(row)

    elif count == 2:
        with open('blankAgentList.csv') as infile:

          with open('agentList2.csv', 'w') as outfile:
            reader = csv.DictReader(infile)
            writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
            writer.writeheader()
            for row in reader:
                count2 += 1
                if not count2 % 3:
                    writer.writerow(row)

    elif count == 3:
        with open('blankAgentList.csv') as infile:

          with open('agentList3.csv', 'w') as outfile:
            reader = csv.DictReader(infile)
            writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
            writer.writeheader()
            for row in reader:
                count2 += 1
                if not count2 % 3:
                    writer.writerow(row)

    count = count + 1 #counts how many times it has ran through the main for loop. 

convert csv to dataframe as (df.to_csv(header=True)) to start indexing from second row

then,pass row/record no in iloc function to fetch particular record using ( df.iloc[ 3 , : ])

you are open your csv file in each if claus from the beginning. I believe you already opened your file into Sourcedataframe . so just get rid of reader = csv.DictReader(infile) and read data like this:

Sourcedataframe.iloc[column]

Using plain python we can create a solution that works for any number of interleaved data rows, let's call it NUM_ROWS, not just three.

Nota Bene: the solution does not require to read and keep the whole input all the data in memory. It processes one line at a time, grouping the last needed few and works fine for a very large input file.

Assuming your input file contains a number of data rows which is a multiple of NUM_ROWS, ie the rows can be split evenly to the output files:

NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]

with open('blankAgentList.csv') as infile:
    header = infile.readline() # read/skip the header

    for f in outfiles: # repeat header in all output files if needed
        f.write(header)

    row_groups = zip(*[iter(infile)]*NUM_ROWS)
    for rg in row_groups:
        for f, r in zip(outfiles, rg):
            f.write(r)

for f in outfiles:
    f.close()

Otherwise, for any number of data rows we can use

import itertools as it

NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]

with open('blankAgentList.csv') as infile:
    header = infile.readline() # read/skip the header

    for f in outfiles: # repeat header in all output files if needed
        f.write(header)

    row_groups = it.zip_longest(*[iter(infile)]*NUM_ROWS)
    for rg in row_groups:
        for f, r in it.zip_longest(outfiles, rg):
            if r is None:
                break
            f.write(r)

for f in outfiles:
    f.close()

which, for example, with an input file of

A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c

produces (output copied straight from the terminal)

(base) SO $ cat blankAgentList.csv 
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c

(base) SO $ cat blankAgentList1.csv 
A,B,C
r1a,r1b,r1c
r4a,r4b,r4c
r7a,r7b,r7c

(base) SO $ cat blankAgentList2.csv 
A,B,C
r2a,r2a,r2c
r5a,r5b,r5c

(base) SO $ cat blankAgentList3.csv 
A,B,C
r3a,r3b,r3c
r6a,r6b,r6c

Note: I understand the line

row_groups = zip(*[iter(infile)]*NUM_ROWS)

may be intimidating at first (it was for me when I started).

All it does is simply to group consecutive lines from the input file.

If your objective includes learning Python, I recommend studying it thoroughly via a book or a course or both and practising a lot.

One key subject is the iteration protocol, along with all the other protocols. And namespaces.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM