简体   繁体   中英

Sample every nth minute in a minute based datetime column in python

How to select every 5th minute row in a dataframe? If 5th minute is missing then 4th or 3rd would do..

I DO NOT WANT MEAN OR ANY AGGREGATE

I have tried:

df.groupby(pd.TimeGrouper('5Min'))['AUDUSD'].mean()

df.resample('5min', how=np.var).head()

both are not producing desired results..

My Input:

                        DATETIME            AUDUSD
DATETIME        
2019-06-07 00:01:00     2019.06.07 00:01    0.69740
2019-06-07 00:02:00     2019.06.07 00:02    0.69742
2019-06-07 00:03:00     2019.06.07 00:03    0.69742
2019-06-07 00:04:00     2019.06.07 00:04    0.69742
2019-06-07 00:05:00     2019.06.07 00:05    0.69739
2019-06-07 00:06:00     2019.06.07 00:06    0.69740
2019-06-07 00:07:00     2019.06.07 00:07    0.69739
2019-06-07 00:08:00     2019.06.07 00:08    0.69740
2019-06-07 00:11:00     2019.06.07 00:11    0.69741
2019-06-07 00:12:00     2019.06.07 00:12    0.69741
2019-06-07 00:13:00     2019.06.07 00:13    0.69740
2019-06-07 00:14:00     2019.06.07 00:14    0.69740
2019-06-07 00:15:00     2019.06.07 00:15    0.69754
2019-06-07 00:16:00     2019.06.07 00:16    0.69749
2019-06-07 00:17:00     2019.06.07 00:17    0.69752
2019-06-07 00:18:00     2019.06.07 00:18    0.69753
2019-06-07 00:19:00     2019.06.07 00:19    0.69758
2019-06-07 00:20:00     2019.06.07 00:20    0.69763
2019-06-07 00:21:00     2019.06.07 00:21    0.69764
2019-06-07 00:23:00     2019.06.07 00:23    0.69765
2019-06-07 00:28:00     2019.06.07 00:28    0.69763

Desired Output:

                        DATETIME            AUDUSD
DATETIME        
2019-06-07 00:05:00     2019.06.07 00:05    0.69739
2019-06-07 00:10:00     2019.06.07 00:08    0.69740
2019-06-07 00:15:00     2019.06.07 00:15    0.69754
2019-06-07 00:20:00     2019.06.07 00:20    0.69763
2019-06-07 00:25:00     2019.06.07 00:23    0.69765
2019-06-07 00:30:00     2019.06.07 00:28    0.69763

This works for me, except i used first as i don't know what method your using:

df.set_index(pd.DatetimeIndex(df['DATETIME']))  

df.set_index(pd.DatetimeIndex(df['DATETIME'])).resample("5T").agg('first')                                                                                                          

Out[2649]: 
                             DATETIME   AUDUSD
DATETIME                                      
2019-06-07 00:00:00  2019.06.07 00:01  0.69740
2019-06-07 00:05:00  2019.06.07 00:05  0.69739
2019-06-07 00:10:00  2019.06.07 00:11  0.69741
2019-06-07 00:15:00  2019.06.07 00:15  0.69754
2019-06-07 00:20:00  2019.06.07 00:20  0.69763
2019-06-07 00:25:00  2019.06.07 00:28  0.69763

first we need to find out how far your final minute is from the nearest 30 then we can reindex and the dataframe whilst adding a custom number of minutes:

def custom_round(x, base=30):
    return int(base * round(float(x)/base))


mins_to_add = cumstom_round(df.index.minute[-1]) # assuming your index is a datetime.
#OR
mins_to_add = cumstom_round(df.DATETIME.minute[-1]) 


df2 = df.set_index('DATETIME').reindex(
      pd.date_range(
          df.DATETIME.min(), 
          df.DATETIME.max(), + pd.Timedelta(f'{mins_to_add}M') freq='1T', closed='left'
      ), 
      method='ffill'
)

print(df2.resample("5T").agg('first'))



                             DATETIME    AUDUSD
DATETIME                                        
2019-06-07 00:00:00 2019-06-07 00:01:00  0.69740
2019-06-07 00:05:00 2019-06-07 00:05:00  0.69739
2019-06-07 00:10:00 2019-06-07 00:08:00  0.69740
2019-06-07 00:15:00 2019-06-07 00:15:00  0.69754
2019-06-07 00:20:00 2019-06-07 00:20:00  0.69763
2019-06-07 00:25:00 2019-06-07 00:23:00  0.69765
2019-06-07 00:30:00 2019-06-07 00:28:00  0.69763

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM