简体   繁体   中英

Pandas datetime index selection

I have the following dataframe:

date  = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2015-02-04 00:00:00','2015-02-04 00:30:00','2015-02-04 01:00:00','2015-02-04 01:30:00','2015-02-04 02:00:00','2015-02-04 02:30:00','2015-02-04 03:00:00','2015-02-04 03:30:00','2015-02-04 04:00:00','2015-02-04 04:30:00','2015-02-04 05:00:00','2015-02-04 05:30:00','2015-02-04 06:00:00','2015-02-04 06:30:00','2015-02-04 07:00:00','2015-02-04 07:30:00','2015-02-04 08:00:00','2015-02-04 08:30:00','2015-02-04 09:00:00','2015-02-04 09:30:00','2015-02-04 10:00:00','2015-02-04 10:30:00','2015-02-04 11:00:00','2015-02-04 11:30:00','2015-02-04 12:00:00','2015-02-04 12:30:00','2015-02-04 13:00:00','2015-02-04 13:30:00','2015-02-04 14:00:00','2015-02-04 14:30:00','2015-02-04 15:00:00','2015-02-04 15:30:00','2015-02-04 16:00:00','2015-02-04 16:30:00','2015-02-04 17:00:00','2015-02-04 17:30:00','2015-02-04 18:00:00','2015-02-04 18:30:00','2015-02-04 19:00:00','2015-02-04 19:30:00','2015-02-04 20:00:00','2015-02-04 20:30:00','2015-02-04 21:00:00','2015-02-04 21:30:00','2015-02-04 22:00:00','2015-02-04 22:30:00','2015-02-04 23:00:00','2015-02-04 23:30:00']
value = [33.24  , 31.71  , 34.39  , 34.49  , 34.67  , 34.46  , 34.59  , 34.83  , 35.78  , 33.03  , 35.49  , 33.79  , 36.12  , 37.09  , 39.54  , 41.19  , 45.99  , 50.23  , 46.72  , 47.47  , 48.46  , 48.38  , 48.40  , 48.13  , 38.35  , 38.19  , 38.12  , 38.05  , 38.06  , 37.83  , 37.49  , 37.41 , 41.84  , 42.26 , 44.09  , 48.85  , 50.07 , 50.94  , 51.09  , 50.60  , 47.39  , 45.57  , 45.03  , 44.98  , 41.32  , 40.37  , 41.12  , 39.33  , 35.38  , 33.44  ]
df = pd.DataFrame({'value':value,'index':date})
df.index = pd.to_datetime(df['index'],format='%Y-%m-%d %H:%M')
df.drop(['index'],axis=1,inplace=True)

df['interval'] = ((df.index.hour >= 16) & (df.index.hour <18 ))*1
print(df.head(50))

I managed to create a column 'interval' to indicate weather or not the hour of the index is between 16h and 18h.

My problem is the following:

  • I have a half-hourly dataframe
  • I would like to create an 'interval' column with half-hourly granularity, ie interval column would be equal to 1 if the index is between 16h30 and 18h30 for instance.

How can I do that efficiently?

Expected result:

                     value     interval
2015-02-04 16:00:00  44.09         0
2015-02-04 16:30:00  48.85         1
2015-02-04 17:00:00  50.07         1
2015-02-04 17:30:00  50.94         1
2015-02-04 18:00:00  51.09         1
2015-02-04 18:30:00  50.60         0
2015-02-04 19:00:00  47.39         0
2015-02-04 19:30:00  45.57         0
2015-02-04 20:00:00  45.03         0

Many thanks,

您还可以使用pandas函数indexer_between_time

df.at[df.index[df.index.indexer_between_time("16:30", "18:30")], "interval"] = 1

There may be a cleaner way to do this ( edit: for instance, @vealkind's solution), but this does what you want:

df['interval'] = (pd.Series(df.index.time)
              .between(pd.to_datetime('16:30:00').time(),
                       pd.to_datetime('18:30:00').time())
              .astype(int)
              .tolist())


>>> df.iloc[30:42]
                     value  interval
index                               
2015-02-04 14:00:00  37.49         0
2015-02-04 14:30:00  37.41         0
2015-02-04 15:00:00  41.84         0
2015-02-04 15:30:00  42.26         0
2015-02-04 16:00:00  44.09         0
2015-02-04 16:30:00  48.85         1
2015-02-04 17:00:00  50.07         1
2015-02-04 17:30:00  50.94         1
2015-02-04 18:00:00  51.09         1
2015-02-04 18:30:00  50.60         1
2015-02-04 19:00:00  47.39         0
2015-02-04 19:30:00  45.57         0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM