如何找到两个熊猫数据框之间的设置差异

Question

I'd like to check the difference between two DataFrame columns. 我想检查两个DataFrame列之间的区别。 I tried using the command: 我尝试使用以下命令：

np.setdiff1d(train.columns, train_1.columns)

which results in an empty array: 这将导致一个空数组：

array([], dtype=object)

However, the number of columns in the dataframes are different: 但是，数据框中的列数是不同的：

len(train.columns), len(train_1.columns) = (51, 56)

which means that the two DataFrame are obviously different. 这意味着两个DataFrame明显不同。

What is wrong here? 怎么了

Answer 1

The results are correct, however, setdiff1d is order dependent. 结果是正确的，但是setdiff1d与顺序有关。 It will only check for elements in the first input array that do not occur in the second array. 它将仅检查第二个数组中未出现的第一个输入数组中的元素。

If you do not care which of the dataframes have the unique columns you can use setxor1d . 如果您不关心哪个数据setxor1d具有唯一列，则可以使用setxor1d 。 It will return "the unique values that are in only one (not both) of the input arrays", see the documentation . 它将返回“仅在输入数组之一（不是两个）中的唯一值”，请参阅文档。

import numpy

colsA = ['a', 'b', 'c', 'd']
colsB = ['b','c']

c = numpy.setxor1d(colsA, colsB)

Will return you an array containing 'a' and 'd'. 将返回一个包含“ a”和“ d”的数组。

If you want to use setdiff1d you need to check for differences both ways: 如果要使用setdiff1d ，则需要两种方式检查差异：

//columns in train.columns that are not in train_1.columns
c1 = np.setdiff1d(train.columns, train_1.columns)

//columns in train_1.columns that are not in train.columns
c2 = np.setdiff1d(train_1.columns, train.columns)

Answer 2

use something like this 用这样的东西

data_3 = data1[~data1.isin(data2)]

Where data1 and data2 are columns and data_3 = data_1 - data_2 其中data1和data2是列，而data_3 = data_1-data_2

如何找到两个熊猫数据框之间的设置差异

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-10-06 05:40:18

解决方案2
1 2018-12-04 07:09:55

如何找到两个熊猫数据框之间的设置差异

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-10-06 05:40:18

解决方案2 1 2018-12-04 07:09:55

解决方案1
1 已采纳 2017-10-06 05:40:18

解决方案2
1 2018-12-04 07:09:55