简体   繁体   English

我如何寻求同时比较不同数据框中的两列?

[英]how do i seek to compare two columns in different dataframes at the same time?

I have two dataframes where i am trying to compare two columns (Cat1 and Cat2) and where Cat1 and Cat2 are the same i want to sum the values in the Prc column我有两个数据框,我试图比较两列(Cat1 和 Cat2),并且 Cat1 和 Cat2 相同,我想对 Prc 列中的值求和

So, in the example below, the only two rows that meet the criteria is row 0 and row 4 of df[0] which meets row 1 and row 4 of df[1] and therefore in this case the sum would be 200 for df[0] and 185 for df[1]因此,在下面的示例中,满足条件的唯一两行是 df[0] 的第 0 行和第 4 行,它们满足 df[1] 的第 1 行和第 4 行,因此在这种情况下,df 的总和为 200 [0] 和 185 用于 df[1]

df[0]
  Cat1 Cat2 Cat3 Prc
0  11   0    5   100
1  22   2    9   150
2  33   1    8    50
3  44   2    6   200
4  55   1    8   100

df[1]
  Cat1 Cat2 Cat3 Prc
0  66   1    6   120
1  11   0    5    90
2  44   1    6   185
3  77   2    7   145
4  55   1    5    95   

i am able to compare Cat1 in df[0][ vs df[1] using.isin but if that is all i did then i would pick up row 3 in df[0] even though Cat2 is different in df[0] and d[1]我能够比较 df[0][ 与 df[1] using.isin 中的 Cat1,但如果这就是我所做的一切,那么即使 Cat2 在 df[0] 中不同,我也会选择 df[0] 中的第 3 行d[1]

how do i seek to compare two columns in different dataframes at the same time?我如何寻求同时比较不同数据框中的两列?

these are large dataframes of 500,000 rows x 32 columns each, so i want to avoid creating new dataframes or new columns.这些是 500,000 行 x 32 列的大型数据框,所以我想避免创建新的数据框或新列。

One idea is use DataFrame.merge for intersection of multiple columns, filter column with Prc and sum :一种想法是使用DataFrame.merge进行多列的交集,使用Prcsum过滤列:

df1 = df[0].merge(df[1], on=['Cat1','Cat2'], suffixes=('_0','_1'))
print (df1)
   Cat1  Cat2  Cat3_0  Prc_0  Cat3_1  Prc_1
0    11     0       5    100       5     90
1    55     1       8    100       5     95

print (df1.filter(like='Prc').sum())
Prc_0    200
Prc_1    185
dtype: int64

Another idea with MultiIndex by columns for intersection with DataFrame.set_index and Index.isin and filtering by boolean indexing : MultiIndex的另一个想法是通过列与DataFrame.set_indexIndex.isin相交并通过boolean indexing进行过滤:

s1 = df[0].set_index(['Cat1','Cat2'])['Prc']
s2 = df[1].set_index(['Cat1','Cat2'])['Prc']

print (s1[s1.index.isin(s2.index)].sum())
200
print (s2[s2.index.isin(s1.index)].sum())
185

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM