[英]How to compare two csv files and print all the differences
I have two csv files (old.csv and new.csv)
with lot of data in them.我有两个 csv 文件(old.csv and new.csv)
其中包含大量数据。 Both the csv files have same data but ordering can be different for each row.两个 csv 文件具有相同的数据,但每行的排序可能不同。 old.csv
file act as the source file. old.csv
文件作为源文件。 I am tyring to compare both the csv files to see if any row is missing or any new row is there.我很想比较这两个 csv 文件,以查看是否缺少任何行或是否存在任何新行。
old.csv
with new.csv
and see if any row is missing in new.csv
or any new row is present in new.csv
.比较old.csv
与new.csv
并看看是否有任何行缺少在new.csv
或任何新行存在于new.csv
。 Each row should exactly match in both the csv's.每一行都应该在两个 csv 中完全匹配。 Below code I have only checks for new row in new.csv
which is not present in old.csv
but how can we also check for all other things.下面的代码我只有在新行检查new.csv
这是不存在的old.csv
但我们怎么也检查所有其他的事情。
with open('old.csv', 'r') as t1, open('new.csv', 'r') as t2:
fileone = t1.readlines()
filetwo = t2.readlines()
with open('update.csv', 'w') as outFile:
for line in filetwo:
if line not in fileone:
outFile.write(line)
Basically both old.csv
and new.csv
files should exactly match with everything like content (each row), number of entries and other things.基本上old.csv
和new.csv
文件都应该与内容(每行)、条目数和其他内容等内容完全匹配。 There should not be any difference between these two files.这两个文件之间应该没有任何区别。
You could consider using difflib for this, but it will have the same limitations as command-line diff.您可以考虑为此使用 difflib,但它具有与命令行 diff 相同的限制。 It can report a line as "new" when it's merely moved.当它只是移动时,它可以将一行报告为“新”。
Assuming order isn't important, the set-based approach is probably what you need.假设顺序并不重要,那么基于集合的方法可能就是您所需要的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.