简体   繁体   中英

Processing csv iteratively 3 rows at a time in Python

I have a csv file like following:

A, B, C, D
2,3,4,5
4,3,5,2
5,8,3,9
7,4,2,6
8,6,3,7

I want to fetch the B values from 3 rows at a time(for first iteration values would be 3,3,8) and save in some variable( value1=3,value2=3,value3=8 ) and pass it on to a function. Once those values are processed. I want to fetch the values from next 3 rows ( value1=3,value2=8,value3=4 ) and so on.

The csv file is large. I am a JAVA developer, if possible suggest the simplest possible code.

An easy solution would be the following:

import pandas as pd
data = pd.read_csv("path.csv")

for i in range(len(data)-2):
    value1 = data.loc[i,"B"]
    value2 = data.loc[i+1,"B"]
    value3 = data.loc[i+2,"B"]
    function(value1, value2, value3)

This is a possible solution (I have used the function proposed in this answer):

import csv
import itertools

# Function to iterate the csv file by chunks (of any size)
def grouper(n, iterable):
    it = iter(iterable)
    while True:
       chunk = tuple(itertools.islice(it, n))
       if not chunk:
           return
       yield chunk

# Open the csv file
with open('myfile.csv') as f:
    csvreader = csv.reader(f)
    # Read the headers: ['A', 'B', 'C', 'D']
    headers = next(csvreader, None)
    # Read the rest of the file by chunks of 3 rows
    for chunk in grouper(3, csvreader):
        # do something with your chunk of rows
        print(chunk)

Printed result:

(['2', '3', '4', '5'], ['4', '3', '5', '2'], ['5', '8', '3', '9'])
(['7', '4', '2', '6'], ['8', '6', '3', '7'])

You can use csv module

import csv
with open('data.txt') as fp:
    reader = csv.reader(fp)
    next(reader) #skips the header
    res = [int(row[1]) for row in reader]
    groups = (res[idx: idx + 3] for idx in range(0, len(res) - 2))
for a, b, c in groups:
    print(a, b, c)

Output:

3 3 8
3 8 4
8 4 6

You can use pandas to read your csv with chunksize argument as described here ( How can I partially read a huge CSV file? )

import pandas as pd

#Function that you want to apply to you arguments
def fn(A, B, C, D):
    print(sum(A), sum(B), sum(C), sum(D))

#Iterate through the chunks
for chunk in pd.read_csv('test.csv', chunksize=3):
    #Convert dataframe to dict
    chunk_dict = chunk.to_dict(orient = 'list')
    #Pass arguments to your functions
    fn(**chunk_dict)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM