简体   繁体   English

python批量读取csv文件

[英]python batch reading of csv file

I'm trying to do a batch reading of the csv file and process the batch by some callback. 我正在尝试批量读取csv文件并通过一些回调处理该批处理。

import csv

with open('file.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    header = next(reader) # skip header

    batch_size = 3
    batch = []
    count = 0

    for row in reader:
        if count >= batch_size:
            do_something(batch)
            batch = []
            count = 0

        batch.append(row)
        count += 1

Let's assume the CSV file has 10 rows (without a header), and a batch_size is 3 . 假设CSV文件有10行(没有标题),并且batch_size为3 The expected result should be 4 batches. 预期结果应为4批次。 3 batches with 3 rows and the 4-th batch will only contain the 1 row. 3批3行,第4批仅包含1行。 The code I wrote produces only 3 batches. 我编写的代码仅产生3批。 If the batch size is 1/2/5/10 -- everything is ok. 如果批次大小为1/2/5/10,则一切正常。

Your condition count >= batch_size will not become True for the last few rows in case the number of rows cannot be divided by batch_size without producing a remainder. 如果行数不能被batch_size而不产生余数,则最后几行的条件count >= batch_size不会变为True

Therefore, you need to manually clear the last batch / remainder. 因此,您需要手动清除最后一批/余数。 Just append something like this after the for loop: 只需在for循环后添加如下内容:

if batch:
    do_something(batch)

This will call your function again, in case the last few rows have been accumulated into batch (which your loop already does, as it iterates over all rows available). 万一最后几行已累积到batch ,这将再次调用您的函数(循环已遍历所有行,您的循环已执行此操作)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM