简体   繁体   中英

Rolling mean over the last n-days with if statement

I have the following dataframe:

entry_time_flat           route_id      time_slot      duration    n_of_trips 

2019-09-02 00:00:00           1_2            0-6          10           29
2019-09-04 00:00:00           3_4            6-12         15           10
2019-09-06 00:00:00           1_2            0-6          20           30    
2019-09-06 00:00:00           1_2           18-20         43           30
...

I would like to compute the mean value of "duration" - creating a new feature - over the last n-days (n_days = 30), with the following condition:

if "n_of_trips" >= 30:
    mean of "duration", over the last 30 days and all the past transactions, grouping by  "route_id" & "time_slot" 
else:
    mean of "duration", over the last 30 days and all the past transactions, grouping by "route_id" only

Unfortunately, splitting the dataframe into two chunks (>= and < 30 n_of_trips) would not yield to an acceptable result since all transactions must be included when computing mean;

How can I implement an if-statement while computing rolling mean over the last n-days?

I am not complete sure if I understood your goal here but I'll try:

import pandas as pd

data = {'entry_time_flat': ['2019-09-02 00:00:00', '2019-09-04 00:00:00', '2019-09-06 00:00:00', '2019-09-06 00:00:00'], 'route_id': ['1_2', '3_4', '1_2', '1_2'], 'time_slot': ['0-6', '6-12', '0-6', '18-20'], 'duration': [10, 15, 20, 43], 'n_of_trips': [29, 10, 30, 30]}
df = pd.DataFrame(data=data)
df.entry_time_flat = pd.to_datetime(df.entry_time_flat)
df.set_index('entry_time_flat', inplace=True)
df['duration_rolling'] = df.duration.rolling('30d', min_periods=1).mean()
print(df)
print(df[df.n_of_trips >= 30].groupby(['route_id']).mean())
print(df[df.n_of_trips >= 30].groupby(['time_slot']).mean())
print(df[df.n_of_trips < 30].groupby(['route_id']).mean())

Output:
                route_id time_slot  duration  n_of_trips  duration_rolling
entry_time_flat                                                           
2019-09-02           1_2       0-6        10          29              10.0
2019-09-04           3_4      6-12        15          10              12.5
2019-09-06           1_2       0-6        20          30              15.0
2019-09-06           1_2     18-20        43          30              22.0
          duration  n_of_trips  duration_rolling
route_id                                        
1_2           31.5        30.0              18.5
           duration  n_of_trips  duration_rolling
time_slot                                        
0-6              20          30              15.0
18-20            43          30              22.0
          duration  n_of_trips  duration_rolling
route_id                                        
1_2             10          29              10.0
3_4             15          10              12.5

In the outputs you can of course dismiss duration .

Was this something you were looking for?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM