Python pandas equivalent to R groupby mutate

Question

So in R when I have a data frame consisting of say 4 columns, call it df and I want to compute the ratio by sum product of a group, I can it in such a way:

// generate data
df = data.frame(a=c(1,1,0,1,0),b=c(1,0,0,1,0),c=c(10,5,1,5,10),d=c(3,1,2,1,2));
| a   b   c    d |
| 1   1   10   3 |
| 1   0   5    1 |
| 0   0   1    2 |
| 1   1   5    1 |
| 0   0   10   2 |
// compute sum product ratio
df = df%>% group_by(a,b) %>%
      mutate(
          ratio=c/sum(c*d)
      );
| a   b   c    d  ratio |
| 1   1   10   3  0.286 |
| 1   1   5    1  0.143 |
| 1   0   5    1  1     |
| 0   0   1    2  0.045 |
| 0   0   10   2  0.454 |

But in python I need to resort to loops. I know there should be a more elegant way than raw loops in python, anyone got any ideas?

Answer 1

It can be done with similar syntax with groupby() and apply() :

df['ratio'] = df.groupby(['a','b'], group_keys=False).apply(lambda g: g.c/(g.c * g.d).sum())

Answer 2

According to this thread on pandas github we can use the transform() method to replicate the combination of dplyr::groupby() and dplyr::mutate() . For this example, it would look as follows:

df = pd.DataFrame( dict( a=(1,1,0,1,0)
                        , b=(1,0,0,1,0)
                        , c=(10,5,1,5,10)
                        , d=(3,1,2,1,2) ) ) \
    .assign( prod_c_d = lambda x: x['c'] * x['d']
            , ratio = lambda x: x['c'] / x.groupby(['a','b']) \
                      .transform('sum')['prod_c_d']  )

This example uses pandas method chaining . For more information on how to use method chaining to replicate dplyr workflows see this blogpost .

The method using apply() and groupby() does not work for me because it does not seem to be adaptable. For example, it does not work if we delete gc/ from the lambda expression.

df['ratio'] = df.groupby(['a','b'], group_keys=False)\
    .apply(lambda g: (g.c * g.d).sum() )

Python pandas equivalent to R groupby mutate

Question

2 answers

solution1
20 ACCPTED 2016-12-02 01:19:16

solution2
6 2019-01-04 08:06:38

Python pandas equivalent to R groupby mutate

Question

2 answers

solution1 20 ACCPTED 2016-12-02 01:19:16

solution2 6 2019-01-04 08:06:38

solution1
20 ACCPTED 2016-12-02 01:19:16

solution2
6 2019-01-04 08:06:38