[英]how do i seek to compare two columns in different dataframes at the same time?
I have two dataframes where i am trying to compare two columns (Cat1 and Cat2) and where Cat1 and Cat2 are the same i want to sum the values in the Prc column我有两个数据框,我试图比较两列(Cat1 和 Cat2),并且 Cat1 和 Cat2 相同,我想对 Prc 列中的值求和
So, in the example below, the only two rows that meet the criteria is row 0 and row 4 of df[0] which meets row 1 and row 4 of df[1] and therefore in this case the sum would be 200 for df[0] and 185 for df[1]因此,在下面的示例中,满足条件的唯一两行是 df[0] 的第 0 行和第 4 行,它们满足 df[1] 的第 1 行和第 4 行,因此在这种情况下,df 的总和为 200 [0] 和 185 用于 df[1]
df[0]
Cat1 Cat2 Cat3 Prc
0 11 0 5 100
1 22 2 9 150
2 33 1 8 50
3 44 2 6 200
4 55 1 8 100
df[1]
Cat1 Cat2 Cat3 Prc
0 66 1 6 120
1 11 0 5 90
2 44 1 6 185
3 77 2 7 145
4 55 1 5 95
i am able to compare Cat1 in df[0][ vs df[1] using.isin but if that is all i did then i would pick up row 3 in df[0] even though Cat2 is different in df[0] and d[1]我能够比较 df[0][ 与 df[1] using.isin 中的 Cat1,但如果这就是我所做的一切,那么即使 Cat2 在 df[0] 中不同,我也会选择 df[0] 中的第 3 行d[1]
how do i seek to compare two columns in different dataframes at the same time?我如何寻求同时比较不同数据框中的两列?
these are large dataframes of 500,000 rows x 32 columns each, so i want to avoid creating new dataframes or new columns.这些是 500,000 行 x 32 列的大型数据框,所以我想避免创建新的数据框或新列。
One idea is use DataFrame.merge
for intersection of multiple columns, filter column with Prc
and sum
:一种想法是使用DataFrame.merge
进行多列的交集,使用Prc
和sum
过滤列:
df1 = df[0].merge(df[1], on=['Cat1','Cat2'], suffixes=('_0','_1'))
print (df1)
Cat1 Cat2 Cat3_0 Prc_0 Cat3_1 Prc_1
0 11 0 5 100 5 90
1 55 1 8 100 5 95
print (df1.filter(like='Prc').sum())
Prc_0 200
Prc_1 185
dtype: int64
Another idea with MultiIndex
by columns for intersection with DataFrame.set_index
and Index.isin
and filtering by boolean indexing
: MultiIndex
的另一个想法是通过列与DataFrame.set_index
和Index.isin
相交并通过boolean indexing
进行过滤:
s1 = df[0].set_index(['Cat1','Cat2'])['Prc']
s2 = df[1].set_index(['Cat1','Cat2'])['Prc']
print (s1[s1.index.isin(s2.index)].sum())
200
print (s2[s2.index.isin(s1.index)].sum())
185
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.