简体   繁体   English

批量读取csv文件。 阅读器总是错过同一行

[英]reading csv file in batches. Reader always misses the same line

I have a simple python script which reads a csv file in batches of 5. The csv file contains a total of 9 records (excluding the header).我有一个简单的 python 脚本,它以 5 个批次读取csv文件。csv 文件总共包含 9 条记录(不包括标题)。 The script bellow reads the file in batches of 5 but always seems to skip the record with ID 6, what am I doing wrong?下面的脚本以 5 个为一组读取文件,但似乎总是跳过ID为 6 的记录,我做错了什么?

csv file: csv 文件:

"RIG_ID","STATUS_DATE"
"1","2019-04-10
"2","2019-04-11
"3","2019-04-12
"4","2019-04-13
"5","2019-04-14
"6","2019-04-15
"7","2019-04-16
"8","2019-04-17
"9","2019-04-18

Python script: Python 脚本:

batch_size = 5
transaction_count = 0

parameter_set = []

with open('test.csv', 'r') as file:
    reader = csv.DictReader(file, delimiter=',')

    for row in reader:

        entry = get_entry(row)

        if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
        else:
            parameter_set.append(entry)
            
    # check if we have records that didn't fit into a batch (i.e. less than 5)
    if(len(parameter_set) > 0):
        execute_transaction(sql, parameter_set)
        transaction_count = transaction_count + 1
        print(f'Transaction count: {transaction_count}')

If I put a breakpoint on that line entry = get_entry(row) after the first batch completes I get ID = 7 thus skipping the 6th line in the csv.如果我在第一批完成后在该行entry = get_entry(row)上放置一个断点,我会得到ID = 7 ,从而跳过 csv 中的第 6 行。

The problem is that you don't append the entry into your parameter_set when your if condition becomes true :问题是,当您的if条件变为true时,您不会append entry您的parameter_set

len(parameter_set) == batch_size

you would need to also append the entry to your parameter_set after you have cleared it.清除parameter_set集后,您还需要append entry So i would propose:所以我建议:

         if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
            parameter_set.append(entry)
        else:
            parameter_set.append(entry)

or to avoid duplicate code you could also move the .append() out of the if-else-condition because it is always executed.或者为了避免重复代码,您还可以将.append()移出 if-else-condition,因为它总是被执行。

       if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
            
       parameter_set.append(entry)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM