简体   繁体   中英

In Python, how to do groupby + arrange + mutate (ifelse) like R?

I have the following dataset:

ID      Date     Flag      Price     Flag_Amt     Factor
1      1/1/10     NA        20          NA          NA
1      1/2/10     3         20.2        1.05        .5
1      1/3/10     NA        19.2        NA          NA
2      1/1/10     5         12          6.50        1.3
2      1/2/10     NA        12.6        NA          NA
2      1/2/10     NA        13          NA          NA 
3      1/1/10     NA        100         NA          NA
3      1/2/10     5         88          16.7        .88
3      1/3/10     NA        90          NA          NA

and I have the following R dplyr code:

df = df %>% group_by(ID) %>% arrange(Date) %>% mutate(New_Factor = ifelse(Flag == 5, (Flag_Amt/Price), Factor))

which would yield the following results:

ID      Date     Flag      Price     Flag_Amt     Factor    New_Factor
1      1/1/10     NA        20          NA          NA         NA 
1      1/2/10     3         20.2        10.1        .5         .5
1      1/3/10     NA        19.2        NA          NA         NA        
2      1/1/10     5         12          6.50        1.3        1.85
2      1/2/10     NA        12.6        NA          NA         NA
2      1/2/10     NA        13          NA          NA         NA
3      1/1/10     NA        100         NA          NA         NA        
3      1/2/10     5         88          16.7        .88        5.27
3      1/3/10     NA        90          NA          NA         NA

However, I am having a difficult time trying to replicate this in Python pandas.

Below is some of the code I have tried and the error I have received:

df['New_Factor'] = df.groupby(['ID']).apply(lambda x: (x.Price/x.Flag_Amt) if x.Flag == 5 else (x.Factor))) 

Error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there some other way, perhaps using .transform() along with np.where() to do this?

Any help is appreciated.

Thanks

Your result from R code should look like this:

r$> library(tibble)
r$> library(dplyr)
r$> df = tribble( 
            ~ID,  ~Date,   ~Flag,  ~Price,  ~Flag_Amt, ~Factor, 
        1,     '1/1/10', NA,      20,       NA,         NA, 
        1,     '1/2/10', 3,       20.2,     1.05,       .5, 
        1,     '1/3/10', NA,      19.2,     NA,         NA, 
        2,     '1/1/10', 5,       12,       6.50,       1.3, 
        2,     '1/2/10', NA,      12.6,     NA,         NA, 
        2,     '1/2/10', NA,      13,       NA,         NA , 
        3,     '1/1/10', NA,      100,      NA,         NA, 
        3,     '1/2/10', 5,       88,       16.7,       .88, 
        3,     '1/3/10', NA,      90,       NA,         NA
    )                                                  
r$> df
# A tibble: 9 x 6
     ID Date    Flag Price Flag_Amt Factor
  <dbl> <chr>  <dbl> <dbl>    <dbl>  <dbl>
1     1 1/1/10    NA  20      NA     NA   
2     1 1/2/10     3  20.2     1.05   0.5 
3     1 1/3/10    NA  19.2    NA     NA   
4     2 1/1/10     5  12       6.5    1.3 
5     2 1/2/10    NA  12.6    NA     NA   
6     2 1/2/10    NA  13      NA     NA   
7     3 1/1/10    NA 100      NA     NA   
8     3 1/2/10     5  88      16.7    0.88
9     3 1/3/10    NA  90      NA     NA   

r$> df %>% group_by(ID) %>% 
      arrange(Date) %>% 
      mutate(New_Factor = ifelse(Flag == 5, (Flag_Amt/Price), Factor))
# A tibble: 9 x 7
# Groups:   ID [3]
     ID Date    Flag Price Flag_Amt Factor New_Factor
  <dbl> <chr>  <dbl> <dbl>    <dbl>  <dbl>      <dbl>
1     1 1/1/10    NA  20      NA     NA        NA    
2     2 1/1/10     5  12       6.5    1.3       0.542
3     3 1/1/10    NA 100      NA     NA        NA    
4     1 1/2/10     3  20.2     1.05   0.5       0.5  
5     2 1/2/10    NA  12.6    NA     NA        NA    
6     2 1/2/10    NA  13      NA     NA        NA    
7     3 1/2/10     5  88      16.7    0.88      0.190
8     1 1/3/10    NA  19.2    NA     NA        NA    
9     3 1/3/10    NA  90      NA     NA        NA  

Here is how it looks like using python package datar , without digging into pandas APIs:

>>> from datar.all import (
...     f, NA, tribble, c, rep,
...     group_by, arrange, mutate, if_else
... )
>>> 
>>> df = tribble(
...     f.ID,  f.Date,   f.Flag,  f.Price,  f.Flag_Amt, f.Factor,
...     1,     '1/1/10', NA,      20,       NA,         NA,
...     1,     '1/2/10', 3,       20.2,     1.05,       .5,
...     1,     '1/3/10', NA,      19.2,     NA,         NA,
...     2,     '1/1/10', 5,       12,       6.50,       1.3,
...     2,     '1/2/10', NA,      12.6,     NA,         NA,
...     2,     '1/2/10', NA,      13,       NA,         NA ,
...     3,     '1/1/10', NA,      100,      NA,         NA,
...     3,     '1/2/10', 5,       88,       16.7,       .88,
...     3,     '1/3/10', NA,      90,       NA,         NA,
... )
>>> df
   ID    Date  Flag  Price  Flag_Amt  Factor
0   1  1/1/10   NaN   20.0       NaN     NaN
1   1  1/2/10   3.0   20.2      1.05    0.50
2   1  1/3/10   NaN   19.2       NaN     NaN
3   2  1/1/10   5.0   12.0      6.50    1.30
4   2  1/2/10   NaN   12.6       NaN     NaN
5   2  1/2/10   NaN   13.0       NaN     NaN
6   3  1/1/10   NaN  100.0       NaN     NaN
7   3  1/2/10   5.0   88.0     16.70    0.88
8   3  1/3/10   NaN   90.0       NaN     NaN
>>> df = (
...     df >> 
...         group_by(f.ID) >> 
...         arrange(f.Date) >> 
...         mutate(New_Factor = if_else(f.Flag == 5, (f.Flag_Amt/f.Price), f.Factor))
... )
>>> df
   ID    Date  Flag  Price  Flag_Amt  Factor New_Factor
0   1  1/1/10   NaN   20.0       NaN     NaN        NaN
1   2  1/1/10   5.0   12.0      6.50    1.30   0.541667
2   3  1/1/10   NaN  100.0       NaN     NaN        NaN
3   1  1/2/10   3.0   20.2      1.05    0.50        0.5
4   2  1/2/10   NaN   12.6       NaN     NaN        NaN
5   2  1/2/10   NaN   13.0       NaN     NaN        NaN
6   3  1/2/10   5.0   88.0     16.70    0.88   0.189773
7   1  1/3/10   NaN   19.2       NaN     NaN        NaN
8   3  1/3/10   NaN   90.0       NaN     NaN        NaN
[Groups: ['ID'] (n=3)]

I am the author of the package. Feel free to submit issues if you have any questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM