简体   繁体   中英

Pandas: Sort and shift some columns by group

I'm trying to make some calculations by shifting the values on a dataframe for each group.

Let me try to explain with an example:

from datetime import date
import pandas as pd

df = pd.DataFrame({
    'customer_id': [1,1,1,2,2,3,3,3,3],
    'product': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],
    'date': [date(2020,1,1), date(2020,3,1), date(2020,3,20),
             date(2020,4,2), date(2020,6,10), 
             date(2020,4,5), date(2020,3,1), date(2020,8,10), date(2021,1,1)],
    'amount': [100, 200, 250, 300, 200, 100, 300, 200, 400]
})
df
##    customer_id product        date  amount
## 0            1       A  2020-01-01     100
## 1            1       A  2020-03-01     200
## 2            1       A  2020-03-20     250
## 3            2       B  2020-04-02     300
## 4            2       B  2020-06-10     200
## 5            3       C  2020-04-05     100
## 6            3       C  2020-03-01     300
## 7            3       D  2020-08-10     200
## 8            3       D  2021-01-01     400

What I need to do is:

  1. Group by customer_id and product
  2. Shift the date and amount columns by 1 row

Something like this:

##                           date  prev_date amount amount_prev
## customer_id product
##           1       A 2020-01-01       None    100         nan
##                     2020-03-01 2020-01-01    200         100
##                     2020-03-20 2020-03-01    250         200
##           2       B 2020-04-02       None    300        None
##                     2020-06-10 2020-06-10    200         300
##           3       C 2020-03-01       None    300        None
##                     2020-04-05 2020-03-01    100         300
##                   D 2020-08-10       None    200        None
##                     2021-01-01 2020-08-10    400         200

Is there a way to get this?

Well, I found a solution:

df.sort_values(['customer_id', 'product', 'date'], inplace=True)
df['date_prev'] = df.groupby(['customer_id', 'product'])['date'].shift(1)
df['amount_prev'] = df.groupby(['customer_id', 'product'])['amount'].shift(1)
df
##    customer_id product        date  amount   date_prev  amount_prev
## 0            1       A  2020-01-01     100         NaN          NaN
## 1            1       A  2020-03-01     200  2020-01-01        100.0
## 2            1       A  2020-03-20     250  2020-03-01        200.0
## 3            2       B  2020-04-02     300         NaN          NaN
## 4            2       B  2020-06-10     200  2020-04-02        300.0
## 6            3       C  2020-03-01     300         NaN          NaN
## 5            3       C  2020-04-05     100  2020-03-01        300.0
## 7            3       D  2020-08-10     200         NaN          NaN
## 8            3       D  2021-01-01     400  2020-08-10        200.0

This is close enough for me... but if there's a better solution, I'd really like to know!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM