[英]Python: sum based on group and display it as an additional column
Say we have a dataframe like the one below:假设我们有一个如下所示的数据框:
channel store units
Offline Bournemouth 62
Offline Kettering 90
Offline Manchester 145
Online Bournemouth 220
Online Kettering 212
Online Manchester 272
My purpose would be to add two more columns containing the full amount of units sold for each channel and the share each store represents within each one.我的目的是再添加两列,其中包含每个渠道销售的全部单位数量以及每个商店在每个渠道中所代表的份额。 In short, the output I desire to reach should look as follows:简而言之,我希望达到的输出应如下所示:
channel store units units_per_channel store_share
Offline Bournemouth 62 297 0.21
Offline Kettering 90 297 0.30
Offline Manchester 145 297 0.49
Online Bournemouth 220 704 0.31
Online Kettering 212 704 0.30
Online Manchester 272 704 0.39
Is there any simple and elegant way to get this?有没有简单而优雅的方法来获得这个?
Do a .grouby()
on the channel
, and get the sum of the units
.在channel
上做一个.grouby()
,并得到units
的总和。 Then simply divide the units
by units_per_channel
然后简单地将units
除以units_per_channel
import pandas as pd
df = pd.DataFrame([['Offline', 'Bournemouth', 62],
['Offline' , 'Kettering' , 90],
['Offline' , 'Manchester' , 145],
['Online' , 'Bournemouth', 220],
['Online' , 'Kettering', 212],
['Online' , 'Manchester', 272]],
columns=['channel','store','units'],)
df['units_per_channel'] = df.groupby('channel')['units'].transform('sum')
df['store_share'] = df['units'] / df['units_per_channel']
Output:输出:
print(df)
channel store units units_per_channel store_share
0 Offline Bournemouth 62 297 0.208754
1 Offline Kettering 90 297 0.303030
2 Offline Manchester 145 297 0.488215
3 Online Bournemouth 220 704 0.312500
4 Online Kettering 212 704 0.301136
5 Online Manchester 272 704 0.386364
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.