简体   繁体   English

Python - 查找两个数据帧之间的行差异

[英]Python - Finding Row Discrepancies Between Two Dataframes

I have two dataframes with the same number of columns, d1 and d2. 我有两个具有相同列数的数据帧,d1和d2。

NOTE: d1 and d2 may have different number of rows. 注意:d1和d2可能具有不同的行数。 NOTE: d1 and d2 may not be indexed to the same row in each data frame. 注意:d1和d2可能不会被索引到每个数据帧中的同一行。

What is the best way to check whether or not the two dataframes have the same data? 检查两个数据帧是否具有相同数据的最佳方法是什么?

My current solution consists of appending the two dataframes together and dropping any rows that match. 我目前的解决方案包括将两个数据帧附加在一起并删除任何匹配的行。

d_combined = d1.append(d2)
d_discrepancy = d_combined.drop_duplicates(keep=False)
print(d_discrepancy)

I am new to python and the pandas library. 我是python和pandas库的新手。 Because I will be using dataframes with millions of rows and 8-10 columns, is there a faster and more efficient way to check for discrepancies? 因为我将使用具有数百万行和8-10列的数据帧,是否有更快速,更有效的方法来检查差异? Can it also be shown which initial dataframe the resulting discrepancy row is from? 是否还可以显示产生的差异行来自哪个初始数据帧?

Setup 设定

d1 = pd.DataFrame(dict(A=[1, 2, 3, 4]))
d2 = pd.DataFrame(dict(A=[2, 3, 4, 5]))

Option 1 选项1
Use pd.merge . 使用pd.merge I'll include the parameter indicator=True to show where the data came from. 我将包含参数indicator=True以显示数据的来源。

d1.merge(d2, how='outer', indicator=True)

   A      _merge
0  1   left_only
1  2        both
2  3        both
3  4        both
4  5  right_only

If they have the same data, I'd expect that the _merge column would be both for everything. 如果他们有相同的数据,我希望_mergeboth适用于所有内容。 So we can check with 所以我们可以查看

d1.merge(d2, how='outer', indicator=True)._merge.eq('both').all()

False

In this case, it returned False therefore not the same data. 在这种情况下,它返回False因此不是相同的数据。


Option 2 选项2
Use drop_duplicates 使用drop_duplicates
You need to make sure you drop the duplicates from the initial dataframes first. 您需要确保首先从初始数据帧中删除重复项。

d1.drop_duplicates().append(d2.drop_duplicates()) \
    .drop_duplicates(keep=False).empty

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM