简体   繁体   中英

python batch reading of csv file

I'm trying to do a batch reading of the csv file and process the batch by some callback.

import csv

with open('file.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    header = next(reader) # skip header

    batch_size = 3
    batch = []
    count = 0

    for row in reader:
        if count >= batch_size:
            do_something(batch)
            batch = []
            count = 0

        batch.append(row)
        count += 1

Let's assume the CSV file has 10 rows (without a header), and a batch_size is 3 . The expected result should be 4 batches. 3 batches with 3 rows and the 4-th batch will only contain the 1 row. The code I wrote produces only 3 batches. If the batch size is 1/2/5/10 -- everything is ok.

Your condition count >= batch_size will not become True for the last few rows in case the number of rows cannot be divided by batch_size without producing a remainder.

Therefore, you need to manually clear the last batch / remainder. Just append something like this after the for loop:

if batch:
    do_something(batch)

This will call your function again, in case the last few rows have been accumulated into batch (which your loop already does, as it iterates over all rows available).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM