简体   繁体   中英

Python Group By and Case When Equivalent Relative to R

I am trying to do a case when/if-else statement in Python on a grouped dataframe to create a new variable. I would want to do the following if I was coding in R and I am trying find an operation in Python that's similar and also vectorized. R code:

dt %>% group_by(user,merchant,date) %>%
mutate(
new_variable = case_when(-amount == lag(amount) ~ 2,
                         True ~ 1)
) %>% ungroup()

In Python I have tried using np.select:

    conditions = [
    (-us_trans['real_amount'] == us_trans['real_amount'].shift(-1)),
    (-us_trans['real_amount'] != us_trans['real_amount'].shift(-1))

]

    values = [
        2, 
        1
        
    ]

But I don't know how to use np.select on a grouped dataframe to create a new variable.

I know that I can use groupby(['user','merchant','date'].apply and pass an if-else statement but I believe this would be done in a loop and I am trying to do it in a vectorized manner to optimize my code.

Thanks!

Using the slow pandas option:

df["new_variable"] = np.where(df.groupby(['user', 'merchant','date'])['amount'].apply(lambda g: g.shift(-1)==-g),2,1)

However, using datatable , along with shift() , ifelse() and by() will be MUCH faster

from datatable import dt, f, by

df = dt.Frame(df)

df[:,
   dt.update(new_variable=dt.ifelse(-1*dt.shift(f.amount)==f.amount,2,1)),
   by(f.user,f.merchant,f.date)
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM