[英]Adding rows to CSV sorted by ID in Python without reading whole file into memory
I have a CSV file where the first column is an ID like so: 我有一个CSV文件,其中第一列是一个ID,如下所示:
5,a
4,b
2,c
1,d
The rows must always be sorted from the biggest to smallest ID. 行必须始终按从最大到最小的ID进行排序。 I have a list of rows that I want to add, without reading the whole original CSV in memory, so I can't just append the rows and sort after.
我有一个要添加的行列表,而没有读取内存中的整个原始CSV,所以我不能只是追加行并进行排序。 Here is the code I came up with:
这是我想出的代码:
import csv
def main():
rows_to_add = [[7, "NEW1"], [6, "NEW2"], [3, "NEW3"], [-2, "NEW4"]]
with open("in.csv", "r") as in_file, open("out.csv", "w") as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
for new_row in rows_to_add:
for source_row in reader:
if new_row[0] >= int(source_row[0]):
writer.writerow(new_row)
writer.writerow(source_row)
break
writer.writerow(source_row)
else:
# If source reader already reached end of file
writer.writerow(new_row)
for remaining_line in in_file:
out_file.write(remaining_line)
with open("out.csv", "r") as out_file:
print(out_file.read())
if __name__ == "__main__":
main()
Result: 结果:
7,NEW1
5,a
6,NEW2
4,b
3,NEW3
2,c
1,d
-2,NEW4
This doesn't work correctly if there are two consecutive IDs, 6,NEW2
should be just after 7,NEW1
and I can't figure out the right way to do it. 如果有两个连续的ID,
6,NEW2
应该6,NEW2
在7,NEW1
之后7,NEW1
这将无法正常工作7,NEW1
而我找不到正确的方法。
You have to use a running pointer for at least one of the lists. 您必须对至少一个列表使用运行指针。 In this case, as you can't read the entire CSV, the running pointer can be used for the other list
在这种情况下,由于您无法读取整个CSV,因此可以将运行指针用于其他列表
Following code should work 以下代码应该工作
import csv
def main():
rows_to_add = [[7, "NEW1"], [6, "NEW2"], [6, "NEW2"], [3, "NEW3"], [-2, "NEW4"]]
with open("in.csv", "r") as in_file, open("out.csv", "w") as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
idx = 0
# For each line in file
for source_row in reader:
# Print all rows from rows_to_add that are larger
while rows_to_add[idx][0] > int(source_row[0]):
writer.writerow(rows_to_add[idx])
idx += 1
# Before printing current line from file
writer.writerow(source_row)
# Print remaining rows in rows_to_add
for row in rows_to_add[idx:]:
writer.writerow(row)
with open("out.csv", "r") as out_file:
print(out_file.read())
if __name__ == "__main__":
main()
sample output for your in.csv
in.csv
样本输出
7,NEW1
6,NEW2
6,NEW2
5,a
4,b
3,NEW3
2,c
1,d
-2,NEW4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.