将一个数据帧中的每一行与 Python 中另一个数据帧中的每一行进行比较

Question

I have two different dataframes with the same features.我有两个具有相同功能的不同数据框。

df1 df1


   AGE   Country     Income
   -----------------------
   33    UK          3500
   24    Australia   1500

df2 df2

   AGE   Country     Income
   -----------------------
   33    Brazil      1300
   54    Australia   2230

I would like to compare each row in df1 to each row df2, and compute the number of differences found in the features values.我想将 df1 中的每一行与每一行 df2 进行比较，并计算在特征值中发现的差异数。

In my example, we have 2 dataframes, each dataframe has 2 instances.在我的示例中，我们有 2 个数据帧，每个数据帧有 2 个实例。 So, will have 4 sort of comparisons.所以，会有4种比较。

For each comparison, i need to return the number of features differences.对于每次比较，我需要返回特征差异的数量。 For example, if we compare the first row in df1 to the first row in df2, we will have 2 differences in the feature values.例如，如果我们将 df1 中的第一行与 df2 中的第一行进行比较，我们将在特征值上有 2 个差异。

Any idea how to implement that?知道如何实现吗？

Answer 1

If I understand correctly, an approach would be to use np.where() and to calculate for each feature individually the number of differences per row and sum these arrays:如果我理解正确，一种方法是使用np.where()并为每个特征单独计算每行的差异数并对这些数组求和：

arr = np.where(df_1['Age']!=df_2['Age'],1,0) + np.where(df_1['Country'] != df_2['Country'],1,0) + np.where(df_1['Income']!=df_2['Income'],1,0)

This will return an array with the number of feature-differences per each row.这将返回一个数组，其中包含每行的特征差异数。 In this case, the output would be:在这种情况下，输出将是：

[2,2]

If there are many columns like in the example below, you can use a for loop:如果有很多列，如下例所示，您可以使用 for 循环：

df_1 = pd.DataFrame({'Age':[1,2,3,4],'Country':['Brazil','UK','Australia','China'],'Var_x':[7,5,7,7],'Var_y':[3,6,3,2],'Var_z':[20,32,31,34]}) 
df_2 = pd.DataFrame({'Age':[1,2,4,5],'Country':['Egypt','UK','India','China'],'Var_x':[7,4,3,7],'Var_y':[3,6,2,2],'Var_z':[20,32,4,32]})
differences = np.zeros(len(df_1))
for i in df_1:
  differences += np.where(df_1[i]!=df_2[i],1,0)
print(differences)

Output:输出：

[1. 1. 5. 2.]

将一个数据帧中的每一行与 Python 中另一个数据帧中的每一行进行比较

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-03-10 15:48:34

将一个数据帧中的每一行与 Python 中另一个数据帧中的每一行进行比较

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-03-10 15:48:34

解决方案1
0 已采纳 2020-03-10 15:48:34