简体   繁体   中英

Getting the mean per hour of a dataframe (for daily data) using python given a number of observations taken within an hour

I have the following sample data:

DATE             TIMEREAD  LEVEL    
09/07/2013       12:30:12   5

09/07/2013       12:40:07   2
09/07/2013       12:50:09   5
10/07/2013       01:05:58   3
10/07/2013       01:25:15   2
11/07/2013       15:05:10   1

I want to get the mean value per hour each day. This is my desired output:

    DATE             TIMEREAD  LEVEL    
    09/07/2013       12:00:00   4
    10/07/2013       01:00:00   2.5
    11/07/2013       15:00:00   1

Does anyone could help me?

I used this:

dates = pd.date_range('2013-07-9', '2018-12-31', freq = 'H')

df['date'] = dates.date
df['hour'] = dates.hour
df['day'] = dates.dayofweek

df= .groupby(df['date', 'hour','day'].apply(lambda x: x.mean()))

but I don't get the desired output and it seems it doesnt match the desired number of hours (24hrs per day) to the actual data in the dataframe (because some hours have no data for "LEVEL").

Round down each timestamp to the nearest hour, then use .groupby() and .mean() to generate the desired output:

df['TIMEREAD'] = pd.to_datetime(df['TIMEREAD']).dt.floor("H")
df.groupby(["DATE", "TIMEREAD"]).mean()

This outputs:

                                LEVEL
DATE       TIMEREAD
09/07/2013 2022-06-26 12:00:00    4.0
10/07/2013 2022-06-26 01:00:00    2.5
11/07/2013 2022-06-26 15:00:00    1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM