[英]reading csv file in batches. Reader always misses the same line
I have a simple python script which reads a csv
file in batches of 5. The csv file contains a total of 9 records (excluding the header).我有一个简单的 python 脚本,它以 5 个批次读取csv
文件。csv 文件总共包含 9 条记录(不包括标题)。 The script bellow reads the file in batches of 5 but always seems to skip the record with ID
6, what am I doing wrong?下面的脚本以 5 个为一组读取文件,但似乎总是跳过ID
为 6 的记录,我做错了什么?
csv file: csv 文件:
"RIG_ID","STATUS_DATE"
"1","2019-04-10
"2","2019-04-11
"3","2019-04-12
"4","2019-04-13
"5","2019-04-14
"6","2019-04-15
"7","2019-04-16
"8","2019-04-17
"9","2019-04-18
Python script: Python 脚本:
batch_size = 5
transaction_count = 0
parameter_set = []
with open('test.csv', 'r') as file:
reader = csv.DictReader(file, delimiter=',')
for row in reader:
entry = get_entry(row)
if(len(parameter_set) == batch_size):
execute_transaction(sql, parameter_set)
transaction_count = transaction_count + 1
print(f'Transaction count: {transaction_count}')
parameter_set.clear()
else:
parameter_set.append(entry)
# check if we have records that didn't fit into a batch (i.e. less than 5)
if(len(parameter_set) > 0):
execute_transaction(sql, parameter_set)
transaction_count = transaction_count + 1
print(f'Transaction count: {transaction_count}')
If I put a breakpoint on that line entry = get_entry(row)
after the first batch completes I get ID = 7
thus skipping the 6th line in the csv.如果我在第一批完成后在该行entry = get_entry(row)
上放置一个断点,我会得到ID = 7
,从而跳过 csv 中的第 6 行。
The problem is that you don't append
the entry
into your parameter_set
when your if
condition becomes true
:问题是,当您的if
条件变为true
时,您不会append
entry
您的parameter_set
:
len(parameter_set) == batch_size
you would need to also append
the entry
to your parameter_set
after you have cleared it.清除parameter_set
集后,您还需要append
entry
。 So i would propose:所以我建议:
if(len(parameter_set) == batch_size):
execute_transaction(sql, parameter_set)
transaction_count = transaction_count + 1
print(f'Transaction count: {transaction_count}')
parameter_set.clear()
parameter_set.append(entry)
else:
parameter_set.append(entry)
or to avoid duplicate code you could also move the .append()
out of the if-else-condition because it is always executed.或者为了避免重复代码,您还可以将.append()
移出 if-else-condition,因为它总是被执行。
if(len(parameter_set) == batch_size):
execute_transaction(sql, parameter_set)
transaction_count = transaction_count + 1
print(f'Transaction count: {transaction_count}')
parameter_set.clear()
parameter_set.append(entry)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.