简体   繁体   English

在Pandas中合并2个数据帧:加入一些列,总结其他列

[英]merge 2 dataframes in Pandas: join on some columns, sum up others

I want to merge two dataframes on specific columns (key1, key2) and sum up the values for another column (value). 我想合并特定列(key1,key2)上的两个数据帧,并总结另一列(值)的值。

>>> df1 = pd.DataFrame({'key1': range(4), 'key2': range(4), 'value': range(4)})
   key1  key2  value
0     0     0      0
1     1     1      1
2     2     2      2
3     3     3      3

>>> df2 = pd.DataFrame({'key1': range(2, 6), 'key2': range(2, 6), 'noise': range(2, 6), 'value': range(10, 14)})
   key1  key2  noise  value
0     2     2      2     10
1     3     3      3     11
2     4     4      4     12
3     5     5      5     13

I want this result: 我想要这个结果:

   key1  key2  value
0     0     0      0
1     1     1      1
2     2     2     12
3     3     3     14
4     4     4     12
5     5     5     13

In SQL terms, I want: 在SQL术语中,我想要:

SELECT df1.key1, df1.key2, df1.value + df2.value AS value
FROM df1 OUTER JOIN df2 ON key1, key2

I tried two approaches: 我尝试了两种方法:

approach 1 方法1

concatenated = pd.concat([df1, df2])
grouped = concatenated.groupby(['key1', 'key2'], as_index=False)
summed = grouped.agg(np.sum)
result = summed[['key1', 'key2', 'value']]

approach 2 方法2

joined = pd.merge(df1, df2, how='outer', on=['key1', 'key2'], suffixes=['_1', '_2'])
joined = joined.fillna(0.0)
joined['value'] = joined['value_1'] + joined['value_2']
result = joined[['key1', 'key2', 'value']]

Both approaches give the result I want, but I wonder if there is a simpler way. 两种方法都给出了我想要的结果,但我想知道是否有更简单的方法。

I don't know about simpler, but you can get a little more concise: 我不知道更简单,但你可以更简洁:

>>> pd.concat([df1, df2]).groupby(["key1", "key2"], as_index=False)["value"].sum()
   key1  key2  value
0     0     0      0
1     1     1      1
2     2     2     12
3     3     3     14
4     4     4     12
5     5     5     13

Depending on your tolerance for chaining ops, you might want to break this onto multiple lines anyway, though (four tends to be close to my upper limit, in this case concat-groupby-select-sum). 根据你对链接操作的容忍度,你可能想要将它分成多行,但是(四个往往接近我的上限,在这种情况下是concat-groupby-select-sum)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM