python 比较大文件中的行

Question

I need to compare two.csv files (files are over 65000 lines).我需要比较两个.csv 文件（文件超过 65000 行）。 Find lines that are not in the second file.查找不在第二个文件中的行。 I am using difflib.ndiff:我正在使用 difflib.ndiff：

for line in difflib.ndiff(text1, text2):
    print(line,)

But I get unexpected results.但我得到了意想不到的结果。 The function finds two identical strings and marks them as different: function 找到两个相同的字符串并将它们标记为不同：

+ Gr4,DQ_3Gb_1m_DR_926_23489,100,,,70,,
- Gr4,DQ_3Gb_1m_DR_926_23489,100,,,70,,

What could be the problem?可能是什么问题呢？
What might be a suitable way to find the differences?什么可能是找到差异的合适方法？

2. 2.

from itertools import izip_longest
l1 = map(lambda x: x.strip(), list(open('test1.txt')))
l2 = map(lambda x: x.strip(), list(open('test2.txt')))
diff_list = izip_longest(l1, l2)
for diff in diff_list:
    print '%s %s %s' % (
        diff[0] or '', 
        '==' if diff[0] == diff[1] else '!=',
        diff[1] or '',
    )

I tried to use the following code to compare files, but I got the same unexpected result, why is this so?我尝试使用以下代码来比较文件，但我得到了同样的意外结果，为什么会这样？

Gr4,DQ_1Gb_1m_DR_926_23486,100,,,70,,!=Gr4,DQ_3Gb_1m_DR_926_23489,100,,,70,,
Gr4,DQ_3Gb_1m_DR_926_23489,100,,,70,,!=Gr4,DQ_1Gb_1m_DR_926_23486,100,,,70,,

Answer 1

This is easy when you use pandas.当您使用 pandas 时，这很容易。 Since you're not provided the dataset.由于您没有提供数据集。 I'll use my own.我会用我自己的。

Assume, i've two csv's.假设，我有两个 csv。

Data looks like this:数据如下所示：

Now print line, that is not present in second file (benz model in not present in second file):现在打印第二个文件中不存在的行（第二个文件中不存在 benz model）：

python 比较大文件中的行

问题描述

1 个解决方案

解决方案1
0 2020-08-11 05:57:22

python 比较大文件中的行

问题描述

1 个解决方案

解决方案1 0 2020-08-11 05:57:22

解决方案1
0 2020-08-11 05:57:22