简体   繁体   English

基于分组创建一个 pandas 列

[英]Create a pandas column based on grouping

I have a data frame like this:我有一个这样的数据框:

Group Col A Col B
  1    A1    B1
  1    A2    B2
  2    A3    B3
  2    A4    B4

I want to create a new columns Per A and Per B respecting grouops, so the outcome should be我想创建一个关于组的新列 Per A 和 Per B,所以结果应该是

Group Col A Col B     Per A           Per B
  1    A1    B1   100*A1/(A1+A2)  100*B1/(B1+B2)
  1    A2    B2   100*A2/(A1+A2)  100*B2/(B1+B2)
  2    A3    B3   100*A3/(A3+A4)  100*B3/(B3+B4)
  2    A4    B4   100*A4/(A3+A4)  100*B4/(B3+B4)

I need a general case, with several groups each of which having different group sizes.我需要一个一般情况,有几个组,每个组都有不同的组大小。

I tried using a for loop, and while I am able to find the columns, I cannot assign them to the data frame.我尝试使用 for 循环,虽然我能够找到列,但我无法将它们分配给数据框。 I don't understand what is exactly that prevents it.我不明白究竟是什么阻止了它。

For example, this is a result I'd be looking for (note how I change the Group column reflecting "different group sizes")例如,这是我要寻找的结果(请注意我如何更改反映“不同组大小”的组列)

Group Col A Col B Per A Per B
  1     1     2   100.0 100.0
  2     1     2   16.67 25.00
  2     2     2   33.33 25.00
  2     3     4   50.00 50.00
  df = pd.DataFrame({
        'Group': [1,1,2,2,3,3,3],
        'ColA': [1,2,3,4,5,6,7],
        'ColB': [10,22,30,40,50,60,70],
    })
 
df = df.merge(df.groupby(['Group'])['ColA', 'ColB'].sum().reset_index(), 
              left_on='Group', right_on='Group')
df['PerA'] = df['ColA_x']*100/df['ColA_y']
df['PerB'] = df['ColB_x']*100/df['ColB_y']

df = df.rename(
    columns={'ColA_x': 'ColA', 'ColB_x': 'ColB'}).drop(
        columns=['ColA_y', 'ColB_y'])

print (df)

output: output:

   Group  ColA  ColB       PerA       PerB
0      1     1    10  33.333333  31.250000
1      1     2    22  66.666667  68.750000
2      2     3    30  42.857143  42.857143
3      2     4    40  57.142857  57.142857
4      3     5    50  27.777778  27.777778
5      3     6    60  33.333333  33.333333
6      3     7    70  38.888889  38.888889

Groupby group and then sum. Groupby 分组然后求和。 That gives you the colum sum per group.这给了你每组的总和。

Set group as index and then divide by the outcome above.将组设置为索引,然后除以上面的结果。 index makes it possible to only divide similar index terms. index 可以只划分相似的索引词。 Code below下面的代码

df.set_index('group').div(df.groupby('group').sum())*100

Try groupby transform with update尝试使用update进行groupby transform

df.update(df.div(df.groupby('Group').transform('sum'))*100)
df
Out[478]: 
   Group       ColA       ColB
0      1  33.333333  31.250000
1      1  66.666667  68.750000
2      2  42.857143  42.857143
3      2  57.142857  57.142857
4      3  27.777778  27.777778
5      3  33.333333  33.333333
6      3  38.888889  38.888889

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM