简体   繁体   中英

pandas add days before holiday

I need to add a column "next_holiday" computing days til next holiday. Below is the expected output once the column is added:

SalesDate holiday   next_holiday
2016-12-20  0             5
2016-12-21  0             4
2016-12-22  0             3
2016-12-23  0             2
2016-12-24  0             1
2016-12-25  1             0

Here's a two-line suggestion:

temp = df.holiday.iloc[::-1].eq(1).cumsum()   
df['next_holiday'] = temp.groupby(temp).cumcount()

It is possible to avoid the groupby() , by using the index of the DataFrame.

1. Using RangeIndex

Assuming the following dataframe:

import pandas as pd

df = pd.DataFrame({
    'SalesDate': ['2016-12-20', '2016-12-21', '2016-12-22', '2016-12-23', '2016-12-24', '2016-12-25', '2016-12-26', '2016-12-27'],
    'holiday': [0, 0, 0, 0, 0, 1, 0, 0]
})

df

    SalesDate  holiday
0  2016-12-20        0
1  2016-12-21        0
2  2016-12-22        0
3  2016-12-23        0
4  2016-12-24        0
5  2016-12-25        1
6  2016-12-26        0
7  2016-12-27        0

One-line solution:

If the DataFrame uses a standard RangeIndex, you could do arithmetic using df.index.where().bfill() and df.index :

df['next_holiday'] = pd.Series(df.index.where(df.holiday == 1), dtype='Int32').fillna(method='bfill') - df.index  # Note: dtype'Int32' nullable int is new in pandas 0.24

Result:

df 
    SalesDate  holiday  next_holiday
0  2016-12-20        0             5
1  2016-12-21        0             4
2  2016-12-22        0             3
3  2016-12-23        0             2
4  2016-12-24        0             1
5  2016-12-25        1             0
6  2016-12-26        0           NaN
7  2016-12-27        0           NaN

2. Using DateTimeIndex

If SalesDate is the index column ( datetime64 type), the solution is similar:

df = df.set_index(pd.to_datetime(df.SalesDate)).drop(columns=['SalesDate'])
df
            holiday
SalesDate          
2016-12-20        0
2016-12-21        0
2016-12-22        0
2016-12-23        0
2016-12-24        0
2016-12-25        1
2016-12-26        0
2016-12-27        0

Solution with date arithmetic:

df['next_holiday'] = ((pd.Series(df.index.where(df.holiday == 1)).fillna(method='bfill') - df.index) / np.timedelta64(1, 'D'))
df['next_holiday'] = df['next_holiday'].astype('Int32')  # pandas >= 0.24 for the nullable integer cast

Result:

df
            holiday  next_holiday
SalesDate                        
2016-12-20        0             5
2016-12-21        0             4
2016-12-22        0             3
2016-12-23        0             2
2016-12-24        0             1
2016-12-25        1             0
2016-12-26        0           NaN
2016-12-27        0           NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM