简体   繁体   English

将行添加到CSV中并按ID在Python中排序,而无需将整个文件读入内存

[英]Adding rows to CSV sorted by ID in Python without reading whole file into memory

I have a CSV file where the first column is an ID like so: 我有一个CSV文件,其中第一列是一个ID,如下所示:

5,a
4,b
2,c
1,d

The rows must always be sorted from the biggest to smallest ID. 行必须始终按从最大到最小的ID进行排序。 I have a list of rows that I want to add, without reading the whole original CSV in memory, so I can't just append the rows and sort after. 我有一个要添加的行列表,而没有读取内存中的整个原始CSV,所以我不能只是追加行并进行排序。 Here is the code I came up with: 这是我想出的代码:

import csv


def main():
    rows_to_add = [[7, "NEW1"], [6, "NEW2"], [3, "NEW3"], [-2, "NEW4"]]

    with open("in.csv", "r") as in_file, open("out.csv", "w") as out_file:
        reader = csv.reader(in_file)
        writer = csv.writer(out_file)

        for new_row in rows_to_add:
            for source_row in reader:
                if new_row[0] >= int(source_row[0]):
                    writer.writerow(new_row)
                    writer.writerow(source_row)
                    break

                writer.writerow(source_row)
            else:
                # If source reader already reached end of file
                writer.writerow(new_row)

        for remaining_line in in_file:
            out_file.write(remaining_line) 

    with open("out.csv", "r") as out_file:
        print(out_file.read())


if __name__ == "__main__":
    main()

Result: 结果:

7,NEW1
5,a
6,NEW2
4,b
3,NEW3
2,c
1,d
-2,NEW4

This doesn't work correctly if there are two consecutive IDs, 6,NEW2 should be just after 7,NEW1 and I can't figure out the right way to do it. 如果有两个连续的ID, 6,NEW2应该6,NEW27,NEW1之后7,NEW1这将无法正常工作7,NEW1而我找不到正确的方法。

You have to use a running pointer for at least one of the lists. 您必须对至少一个列表使用运行指针。 In this case, as you can't read the entire CSV, the running pointer can be used for the other list 在这种情况下,由于您无法读取整个CSV,因此可以将运行指针用于其他列表

Following code should work 以下代码应该工作

import csv


def main():
    rows_to_add = [[7, "NEW1"], [6, "NEW2"], [6, "NEW2"], [3, "NEW3"], [-2, "NEW4"]]

    with open("in.csv", "r") as in_file, open("out.csv", "w") as out_file:
        reader = csv.reader(in_file)
        writer = csv.writer(out_file)
        idx = 0

        # For each line in file
        for source_row in reader:
            # Print all rows from rows_to_add that are larger
            while rows_to_add[idx][0] > int(source_row[0]):
                writer.writerow(rows_to_add[idx])
                idx += 1

            # Before printing current line from file
            writer.writerow(source_row)

        # Print remaining rows in rows_to_add
        for row in rows_to_add[idx:]:
            writer.writerow(row)


    with open("out.csv", "r") as out_file:
        print(out_file.read())


if __name__ == "__main__":
    main()

sample output for your in.csv in.csv样本输出

7,NEW1
6,NEW2
6,NEW2
5,a
4,b
3,NEW3
2,c
1,d
-2,NEW4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM