[英]Create a pandas column based on grouping
I have a data frame like this:我有一个这样的数据框:
Group Col A Col B
1 A1 B1
1 A2 B2
2 A3 B3
2 A4 B4
I want to create a new columns Per A and Per B respecting grouops, so the outcome should be我想创建一个关于组的新列 Per A 和 Per B,所以结果应该是
Group Col A Col B Per A Per B
1 A1 B1 100*A1/(A1+A2) 100*B1/(B1+B2)
1 A2 B2 100*A2/(A1+A2) 100*B2/(B1+B2)
2 A3 B3 100*A3/(A3+A4) 100*B3/(B3+B4)
2 A4 B4 100*A4/(A3+A4) 100*B4/(B3+B4)
I need a general case, with several groups each of which having different group sizes.我需要一个一般情况,有几个组,每个组都有不同的组大小。
I tried using a for loop, and while I am able to find the columns, I cannot assign them to the data frame.我尝试使用 for 循环,虽然我能够找到列,但我无法将它们分配给数据框。 I don't understand what is exactly that prevents it.我不明白究竟是什么阻止了它。
For example, this is a result I'd be looking for (note how I change the Group column reflecting "different group sizes")例如,这是我要寻找的结果(请注意我如何更改反映“不同组大小”的组列)
Group Col A Col B Per A Per B
1 1 2 100.0 100.0
2 1 2 16.67 25.00
2 2 2 33.33 25.00
2 3 4 50.00 50.00
df = pd.DataFrame({
'Group': [1,1,2,2,3,3,3],
'ColA': [1,2,3,4,5,6,7],
'ColB': [10,22,30,40,50,60,70],
})
df = df.merge(df.groupby(['Group'])['ColA', 'ColB'].sum().reset_index(),
left_on='Group', right_on='Group')
df['PerA'] = df['ColA_x']*100/df['ColA_y']
df['PerB'] = df['ColB_x']*100/df['ColB_y']
df = df.rename(
columns={'ColA_x': 'ColA', 'ColB_x': 'ColB'}).drop(
columns=['ColA_y', 'ColB_y'])
print (df)
output: output:
Group ColA ColB PerA PerB
0 1 1 10 33.333333 31.250000
1 1 2 22 66.666667 68.750000
2 2 3 30 42.857143 42.857143
3 2 4 40 57.142857 57.142857
4 3 5 50 27.777778 27.777778
5 3 6 60 33.333333 33.333333
6 3 7 70 38.888889 38.888889
Groupby group and then sum. Groupby 分组然后求和。 That gives you the colum sum per group.这给了你每组的总和。
Set group as index and then divide by the outcome above.将组设置为索引,然后除以上面的结果。 index makes it possible to only divide similar index terms. index 可以只划分相似的索引词。 Code below下面的代码
df.set_index('group').div(df.groupby('group').sum())*100
Try groupby
transform
with update
尝试使用update
进行groupby
transform
df.update(df.div(df.groupby('Group').transform('sum'))*100)
df
Out[478]:
Group ColA ColB
0 1 33.333333 31.250000
1 1 66.666667 68.750000
2 2 42.857143 42.857143
3 2 57.142857 57.142857
4 3 27.777778 27.777778
5 3 33.333333 33.333333
6 3 38.888889 38.888889
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.