简体   繁体   English

如何比较两个 csv 文件并打印所有差异

[英]How to compare two csv files and print all the differences

I have two csv files (old.csv and new.csv) with lot of data in them.我有两个 csv 文件(old.csv and new.csv)其中包含大量数据。 Both the csv files have same data but ordering can be different for each row.两个 csv 文件具有相同的数据,但每行的排序可能不同。 old.csv file act as the source file. old.csv文件作为源文件。 I am tyring to compare both the csv files to see if any row is missing or any new row is there.我很想比较这两个 csv 文件,以查看是否缺少任何行或是否存在任何新行。

  • Compare old.csv with new.csv and see if any row is missing in new.csv or any new row is present in new.csv .比较old.csvnew.csv并看看是否有任何行缺少在new.csv或任何新行存在于new.csv Each row should exactly match in both the csv's.每一行都应该在两个 csv 中完全匹配。

Below code I have only checks for new row in new.csv which is not present in old.csv but how can we also check for all other things.下面的代码我只有在新行检查new.csv这是不存在的old.csv但我们怎么也检查所有其他的事情。

with open('old.csv', 'r') as t1, open('new.csv', 'r') as t2:
    fileone = t1.readlines()
    filetwo = t2.readlines()

with open('update.csv', 'w') as outFile:
    for line in filetwo:
        if line not in fileone:
            outFile.write(line)

Basically both old.csv and new.csv files should exactly match with everything like content (each row), number of entries and other things.基本上old.csvnew.csv文件都应该与内容(每行)、条目数和其他内容等内容完全匹配。 There should not be any difference between these two files.这两个文件之间应该没有任何区别。

You could consider using difflib for this, but it will have the same limitations as command-line diff.您可以考虑为此使用 difflib,但它具有与命令行 diff 相同的限制。 It can report a line as "new" when it's merely moved.当它只是移动时,它可以将一行报告为“新”。

Assuming order isn't important, the set-based approach is probably what you need.假设顺序并不重要,那么基于集合的方法可能就是您所需要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM