![](/img/trans.png)
[英]Python pandas: create new column based on max value within group, but using value from additional (string) column
[英]Python: sum based on group and display it as an additional column
假设我们有一个如下所示的数据框:
channel store units
Offline Bournemouth 62
Offline Kettering 90
Offline Manchester 145
Online Bournemouth 220
Online Kettering 212
Online Manchester 272
我的目的是再添加两列,其中包含每个渠道销售的全部单位数量以及每个商店在每个渠道中所代表的份额。 简而言之,我希望达到的输出应如下所示:
channel store units units_per_channel store_share
Offline Bournemouth 62 297 0.21
Offline Kettering 90 297 0.30
Offline Manchester 145 297 0.49
Online Bournemouth 220 704 0.31
Online Kettering 212 704 0.30
Online Manchester 272 704 0.39
有没有简单而优雅的方法来获得这个?
在channel
上做一个.grouby()
,并得到units
的总和。 然后简单地将units
除以units_per_channel
import pandas as pd
df = pd.DataFrame([['Offline', 'Bournemouth', 62],
['Offline' , 'Kettering' , 90],
['Offline' , 'Manchester' , 145],
['Online' , 'Bournemouth', 220],
['Online' , 'Kettering', 212],
['Online' , 'Manchester', 272]],
columns=['channel','store','units'],)
df['units_per_channel'] = df.groupby('channel')['units'].transform('sum')
df['store_share'] = df['units'] / df['units_per_channel']
输出:
print(df)
channel store units units_per_channel store_share
0 Offline Bournemouth 62 297 0.208754
1 Offline Kettering 90 297 0.303030
2 Offline Manchester 145 297 0.488215
3 Online Bournemouth 220 704 0.312500
4 Online Kettering 212 704 0.301136
5 Online Manchester 272 704 0.386364
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.