简体   繁体   English

如何找到两个熊猫数据框之间的设置差异

[英]How to find the set difference between two Pandas DataFrames

I'd like to check the difference between two DataFrame columns. 我想检查两个DataFrame列之间的区别。 I tried using the command: 我尝试使用以下命令:

np.setdiff1d(train.columns, train_1.columns)

which results in an empty array: 这将导致一个空数组:

array([], dtype=object)

However, the number of columns in the dataframes are different: 但是,数据框中的列数是不同的:

len(train.columns), len(train_1.columns) = (51, 56)

which means that the two DataFrame are obviously different. 这意味着两个DataFrame明显不同。

What is wrong here? 怎么了

The results are correct, however, setdiff1d is order dependent. 结果是正确的,但是setdiff1d与顺序有关。 It will only check for elements in the first input array that do not occur in the second array. 它将仅检查第二个数组中未出现的第一个输入数组中的元素。

If you do not care which of the dataframes have the unique columns you can use setxor1d . 如果您不关心哪个数据setxor1d具有唯一列,则可以使用setxor1d It will return "the unique values that are in only one (not both) of the input arrays", see the documentation . 它将返回“仅在输入数组之一(不是两个)中的唯一值”,请参阅文档

import numpy

colsA = ['a', 'b', 'c', 'd']
colsB = ['b','c']

c = numpy.setxor1d(colsA, colsB)

Will return you an array containing 'a' and 'd'. 将返回一个包含“ a”和“ d”的数组。


If you want to use setdiff1d you need to check for differences both ways: 如果要使用setdiff1d ,则需要两种方式检查差异:

//columns in train.columns that are not in train_1.columns
c1 = np.setdiff1d(train.columns, train_1.columns)

//columns in train_1.columns that are not in train.columns
c2 = np.setdiff1d(train_1.columns, train.columns)

use something like this 用这样的东西

data_3 = data1[~data1.isin(data2)]

Where data1 and data2 are columns and data_3 = data_1 - data_2 其中data1和data2是列,而data_3 = data_1-data_2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算两个数据帧之间的Pandas差异 - Computing Set Difference in Pandas between two dataframes 熊猫中两个数据框之间的差异 - difference between two dataframes in Pandas 如何找到两个熊猫数据框之间的交集 - How to find interserction between two pandas dataframes 在python中找到两个数据帧之间的差异(设置差异) - Find the difference (set difference) between two dataframes in python 找出两个 Pandas Dataframes 之间的差异,并将其全部添加到新的 DF - Find difference between two pandas Dataframes and and add all to new DF 如何在Pandas的一部分列中找到两个数据框中的行的“集合差异”? - How can I find the “set difference” of rows in two dataframes on a subset of columns in Pandas? 如何使用 pandas 有效地找到两个大数据帧之间的逆交集? - How to efficiently find the inverse intersection between two large dataframes with pandas? 查找列上两个 DataFrame 之间的差异 - Find Difference Between Two DataFrames on Columns 使用公共关键列pandas查找数据框的任何两列之间的差异 - find difference between any two columns of dataframes with a common key column pandas python / pandas - 查找两个数据框之间的公共列,并创建另一个具有相同列的数据框以显示它们的差异 - python / pandas - Find common columns between two dataframes, and create another one with same columns showing their difference
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM