我如何寻求同时比较不同数据框中的两列？

Question

I have two dataframes where i am trying to compare two columns (Cat1 and Cat2) and where Cat1 and Cat2 are the same i want to sum the values in the Prc column我有两个数据框，我试图比较两列（Cat1 和 Cat2），并且 Cat1 和 Cat2 相同，我想对 Prc 列中的值求和

So, in the example below, the only two rows that meet the criteria is row 0 and row 4 of df[0] which meets row 1 and row 4 of df[1] and therefore in this case the sum would be 200 for df[0] and 185 for df[1]因此，在下面的示例中，满足条件的唯一两行是 df[0] 的第 0 行和第 4 行，它们满足 df[1] 的第 1 行和第 4 行，因此在这种情况下，df 的总和为 200 [0] 和 185 用于 df[1]

df[0]
  Cat1 Cat2 Cat3 Prc
0  11   0    5   100
1  22   2    9   150
2  33   1    8    50
3  44   2    6   200
4  55   1    8   100

df[1]
  Cat1 Cat2 Cat3 Prc
0  66   1    6   120
1  11   0    5    90
2  44   1    6   185
3  77   2    7   145
4  55   1    5    95

i am able to compare Cat1 in df[0][ vs df[1] using.isin but if that is all i did then i would pick up row 3 in df[0] even though Cat2 is different in df[0] and d[1]我能够比较 df[0][ 与 df[1] using.isin 中的 Cat1，但如果这就是我所做的一切，那么即使 Cat2 在 df[0] 中不同，我也会选择 df[0] 中的第 3 行d[1]

how do i seek to compare two columns in different dataframes at the same time?我如何寻求同时比较不同数据框中的两列？

these are large dataframes of 500,000 rows x 32 columns each, so i want to avoid creating new dataframes or new columns.这些是 500,000 行 x 32 列的大型数据框，所以我想避免创建新的数据框或新列。

Answer 1

One idea is use DataFrame.merge for intersection of multiple columns, filter column with Prc and sum :一种想法是使用DataFrame.merge进行多列的交集，使用Prc和sum过滤列：

df1 = df[0].merge(df[1], on=['Cat1','Cat2'], suffixes=('_0','_1'))
print (df1)
   Cat1  Cat2  Cat3_0  Prc_0  Cat3_1  Prc_1
0    11     0       5    100       5     90
1    55     1       8    100       5     95

print (df1.filter(like='Prc').sum())
Prc_0    200
Prc_1    185
dtype: int64

Another idea with MultiIndex by columns for intersection with DataFrame.set_index and Index.isin and filtering by boolean indexing : MultiIndex的另一个想法是通过列与DataFrame.set_index和Index.isin相交并通过boolean indexing进行过滤：

s1 = df[0].set_index(['Cat1','Cat2'])['Prc']
s2 = df[1].set_index(['Cat1','Cat2'])['Prc']

print (s1[s1.index.isin(s2.index)].sum())
200
print (s2[s2.index.isin(s1.index)].sum())
185

我如何寻求同时比较不同数据框中的两列？

问题描述

1 个解决方案

解决方案1
2 2019-10-11 07:27:13

我如何寻求同时比较不同数据框中的两列？

问题描述

1 个解决方案

解决方案1 2 2019-10-11 07:27:13

解决方案1
2 2019-10-11 07:27:13