[英]Python pandas groupby percentage to total by category
我有下表:
+-----+----------+---+
| Grp | Category | X |
+-----+----------+---+
| 1 | A | 1 |
| 1 | B | 3 |
| 1 | B | 2 |
| 1 | C | 2 |
| 2 | A | 2 |
| 2 | A | 4 |
| 2 | B | 4 |
| 3 | A | 3 |
| 3 | C | 7 |
+-----+----------+---+
並試圖獲得關注:
+-----+----------+---------+
| Grp | Category | X_ratio |
+-----+----------+---------+
| 1 | A | 1/8 |
| 1 | B | 5/8 |
| 1 | C | 2/8 |
| 2 | A | 6/10 |
| 2 | B | 4/10 |
| 3 | A | 3/10 |
| 3 | C | 7/10 |
+-----+----------+---------+
有點卡住了。 有人可以建議有效的解決方案嗎?
我當前的代碼 - 它可以工作,但似乎效率不高:
grp_Cat = df.groupby(['Grp', 'Category ']).agg({'X': 'sum'})
grp_total = df.groupby(['Grp']).agg({'X': 'sum'})
grp_Cat.div(grp_total, level='Grp') * 100
因為性能很重要,所以首先將sum
匯總到MultiIndex Series
,然后除以每個第一個Grp
級別的Series.div
總和值:
s = df.groupby(['Grp','Category'])['X'].sum()
df = s.div(s.sum(level=0), level=0).reset_index(name='X_ratio')
print (df)
Grp Category X_ratio
0 1 A 0.125
1 1 B 0.625
2 1 C 0.250
3 2 A 0.600
4 2 B 0.400
5 3 A 0.300
6 3 C 0.700
較慢的替代方案:
df = (df.groupby(['Grp','Category'])['X'].sum()
.groupby(level=0)
.apply(lambda x: x / x.sum())
.reset_index(name='X_ratio'))
print (df)
Grp Category X_ratio
0 1 A 0.125
1 1 B 0.625
2 1 C 0.250
3 2 A 0.600
4 2 B 0.400
5 3 A 0.300
6 3 C 0.700
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.