简体   繁体   English

将一个数据帧中的每一行与 Python 中另一个数据帧中的每一行进行比较

[英]Compare each row in one dataframe to each row in another dataframe in Python

I have two different dataframes with the same features.我有两个具有相同功能的不同数据框。

df1 df1


   AGE   Country     Income
   -----------------------
   33    UK          3500
   24    Australia   1500


df2 df2

   AGE   Country     Income
   -----------------------
   33    Brazil      1300
   54    Australia   2230

I would like to compare each row in df1 to each row df2, and compute the number of differences found in the features values.我想将 df1 中的每一行与每一行 df2 进行比较,并计算在特征值中发现的差异数。

In my example, we have 2 dataframes, each dataframe has 2 instances.在我的示例中,我们有 2 个数据帧,每个数据帧有 2 个实例。 So, will have 4 sort of comparisons.所以,会有4种比较。

For each comparison, i need to return the number of features differences.对于每次比较,我需要返回特征差异的数量。 For example, if we compare the first row in df1 to the first row in df2, we will have 2 differences in the feature values.例如,如果我们将 df1 中的第一行与 df2 中的第一行进行比较,我们将在特征值上有 2 个差异。

Any idea how to implement that?知道如何实现吗?

If I understand correctly, an approach would be to use np.where() and to calculate for each feature individually the number of differences per row and sum these arrays:如果我理解正确,一种方法是使用np.where()并为每个特征单独计算每行的差异数并对这些数组求​​和:

arr = np.where(df_1['Age']!=df_2['Age'],1,0) + np.where(df_1['Country'] != df_2['Country'],1,0) + np.where(df_1['Income']!=df_2['Income'],1,0)

This will return an array with the number of feature-differences per each row.这将返回一个数组,其中包含每行的特征差异数。 In this case, the output would be:在这种情况下,输出将是:

[2,2]

If there are many columns like in the example below, you can use a for loop:如果有很多列,如下例所示,您可以使用 for 循环:

df_1 = pd.DataFrame({'Age':[1,2,3,4],'Country':['Brazil','UK','Australia','China'],'Var_x':[7,5,7,7],'Var_y':[3,6,3,2],'Var_z':[20,32,31,34]}) 
df_2 = pd.DataFrame({'Age':[1,2,4,5],'Country':['Egypt','UK','India','China'],'Var_x':[7,4,3,7],'Var_y':[3,6,2,2],'Var_z':[20,32,4,32]})
differences = np.zeros(len(df_1))
for i in df_1:
  differences += np.where(df_1[i]!=df_2[i],1,0)
print(differences)

Output:输出:

[1. 1. 5. 2.]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何迭代一个 dataframe 的每一行并与 Python 中的另一个 dataframe 中的行进行比较? - how to iterate each row of one dataframe and compare with rows in another dataframe in Python? 比较每行的数据框列中的元素 - Python - Compare elements in dataframe columns for each row - Python Python:将DataFrame的每一行除以另一个DataFrame向量 - Python: Divide each row of a DataFrame by another DataFrame vector 如何比较 python 中 dataframe 中每一行的所有值 - how to compare all values for each row in a dataframe in python 如何比较项目列表是否出现在python数据框的每一行中 - How to compare if list of items are present in each row of a dataframe in python 将每一行与数据框中的其他行进行比较 - Compare each row to the other rows in a Dataframe 如何将 dataframe 中行的每个值与之前行中的每个值与 python 进行比较? - How to compare each value of row in a dataframe with each value in the row before with python? 根据另一个 dataframe 中的行查询一个 dataframe 行并比较值 - Query for one dataframe row based on row in another dataframe & compare values Iterows 替换一个 dataframe 的每一行到另一个之间的计算 - Iterows replacement for a calculation between each row of one dataframe to another 将 pandas dataframe 的每一行乘以另一行 dataframe - Multiplying each row of a pandas dataframe by another row dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM