简体   繁体   中英

How to extract hourly data from a df in python?

I have the following df

     dates         Final
2020-01-01 00:15:00 94.7
2020-01-01 00:30:00 94.1
2020-01-01 00:45:00 94.1
2020-01-01 01:00:00 95.0
2020-01-01 01:15:00 96.6
2020-01-01 01:30:00 98.4
2020-01-01 01:45:00 99.8
2020-01-01 02:00:00 99.8
2020-01-01 02:15:00 98.0
2020-01-01 02:30:00 95.1
2020-01-01 02:45:00 91.9
2020-01-01 03:00:00 89.5

The entire dataset is till 2021-01-01 00:00:00 95.6 with a gap of 15mins.

Since the freq is 15mins, I would like to change it to 1 hour and maybe drop the middle values

Expected output

      dates        Final
2020-01-01 01:00:00 95.0
2020-01-01 02:00:00 99.8
2020-01-01 03:00:00 89.5

With the last row being 2021-01-01 00:00:00 95.6

How can this be done?

Thanks

Use Series.dt.minute to performance a boolean indexing :

df_filtered = df.loc[df['dates'].dt.minute.eq(0)]
#if necessary
#df_filtered = df.loc[pd.to_datetime(df['dates']).dt.minute.eq(0)]
print(df_filtered)
                 dates  Final
3  2020-01-01 01:00:00   95.0
7  2020-01-01 02:00:00   99.8
11 2020-01-01 03:00:00   89.5

If you're doing data analysis or data science I don't think dropping the middle values is a good approach at all! You should sum them I guess (I don't know about your use case but I know some stuff about Time Series data).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM