[英]Compare each row in one dataframe to each row in another dataframe in Python
I have two different dataframes with the same features.我有两个具有相同功能的不同数据框。
df1 df1
AGE Country Income
-----------------------
33 UK 3500
24 Australia 1500
df2 df2
AGE Country Income
-----------------------
33 Brazil 1300
54 Australia 2230
I would like to compare each row in df1 to each row df2, and compute the number of differences found in the features values.我想将 df1 中的每一行与每一行 df2 进行比较,并计算在特征值中发现的差异数。
In my example, we have 2 dataframes, each dataframe has 2 instances.在我的示例中,我们有 2 个数据帧,每个数据帧有 2 个实例。 So, will have 4 sort of comparisons.
所以,会有4种比较。
For each comparison, i need to return the number of features differences.对于每次比较,我需要返回特征差异的数量。 For example, if we compare the first row in df1 to the first row in df2, we will have 2 differences in the feature values.
例如,如果我们将 df1 中的第一行与 df2 中的第一行进行比较,我们将在特征值上有 2 个差异。
Any idea how to implement that?知道如何实现吗?
If I understand correctly, an approach would be to use np.where()
and to calculate for each feature individually the number of differences per row and sum these arrays:如果我理解正确,一种方法是使用
np.where()
并为每个特征单独计算每行的差异数并对这些数组求和:
arr = np.where(df_1['Age']!=df_2['Age'],1,0) + np.where(df_1['Country'] != df_2['Country'],1,0) + np.where(df_1['Income']!=df_2['Income'],1,0)
This will return an array with the number of feature-differences per each row.这将返回一个数组,其中包含每行的特征差异数。 In this case, the output would be:
在这种情况下,输出将是:
[2,2]
If there are many columns like in the example below, you can use a for loop:如果有很多列,如下例所示,您可以使用 for 循环:
df_1 = pd.DataFrame({'Age':[1,2,3,4],'Country':['Brazil','UK','Australia','China'],'Var_x':[7,5,7,7],'Var_y':[3,6,3,2],'Var_z':[20,32,31,34]})
df_2 = pd.DataFrame({'Age':[1,2,4,5],'Country':['Egypt','UK','India','China'],'Var_x':[7,4,3,7],'Var_y':[3,6,2,2],'Var_z':[20,32,4,32]})
differences = np.zeros(len(df_1))
for i in df_1:
differences += np.where(df_1[i]!=df_2[i],1,0)
print(differences)
Output:输出:
[1. 1. 5. 2.]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.