简体   繁体   中英

reading csv file in batches. Reader always misses the same line

I have a simple python script which reads a csv file in batches of 5. The csv file contains a total of 9 records (excluding the header). The script bellow reads the file in batches of 5 but always seems to skip the record with ID 6, what am I doing wrong?

csv file:

"RIG_ID","STATUS_DATE"
"1","2019-04-10
"2","2019-04-11
"3","2019-04-12
"4","2019-04-13
"5","2019-04-14
"6","2019-04-15
"7","2019-04-16
"8","2019-04-17
"9","2019-04-18

Python script:

batch_size = 5
transaction_count = 0

parameter_set = []

with open('test.csv', 'r') as file:
    reader = csv.DictReader(file, delimiter=',')

    for row in reader:

        entry = get_entry(row)

        if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
        else:
            parameter_set.append(entry)
            
    # check if we have records that didn't fit into a batch (i.e. less than 5)
    if(len(parameter_set) > 0):
        execute_transaction(sql, parameter_set)
        transaction_count = transaction_count + 1
        print(f'Transaction count: {transaction_count}')

If I put a breakpoint on that line entry = get_entry(row) after the first batch completes I get ID = 7 thus skipping the 6th line in the csv.

The problem is that you don't append the entry into your parameter_set when your if condition becomes true :

len(parameter_set) == batch_size

you would need to also append the entry to your parameter_set after you have cleared it. So i would propose:

         if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
            parameter_set.append(entry)
        else:
            parameter_set.append(entry)

or to avoid duplicate code you could also move the .append() out of the if-else-condition because it is always executed.

       if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
            
       parameter_set.append(entry)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM