简体   繁体   中英

Compute daily averages of numeric and non-numeric columns in pandas

I have a dataframe with hourly time index:

                     wind_direction     relative_humidity  
dates                                                 
2017-07-18 19:00:00              W                88  
2017-07-18 20:00:00              N                88  
2017-07-18 21:00:00              W                90  
2017-07-18 22:00:00              S                91  
2017-07-18 23:00:00              W                93  

How can I compute daily average such that for numeric columns we compute daily mean and for non-numeric columns we output the value which occurs most number of times.

-- EDIT:

I did this:

df = df.resample('D').mean()

However this returns an error

Option 1

from cytoolz.dicttoolz import merge

ncols = df.select_dtypes([np.number]).columns
ocols = df.columns.difference(ncols)

df.index = pd.to_datetime(df.index)

d = merge(
    {c: 'mean' for c in ncols},
    {c: lambda x: pd.value_counts(x).index[0] for c in ocols}
)

df.resample('D').agg(d)

            relative_humidity wind_direction
dates                                       
2017-07-18                 90              W

​

Option 2

df.index = pd.to_datetime(df.index)

g = df.resample('D')
g.mean().combine_first(g.agg(lambda x: pd.value_counts(x).index[0]))[df.columns]

            relative_humidity wind_direction
dates                                       
2017-07-18                 90              W

If you want to calculate daily statistics for more than one column, I think divide-and-conquer might be a good choice.

The first step is how to aggregate by date.

df['dates'] = pd.to_datetime(df['dates'])
df['Date'] = df['dates'].apply(lambda dt: dt.date())

The second step is how to compute most frequent direction for each day.

group1 = df.groupby(by=['Date'],as_index = False)['wind_direction'].agg(lambda dt:dt.value_counts(ascending=False).index[0])

The third step is how to compute the daily mean, which is similar to 2nd step.

The last step is to merge them together on column "Date". Then you will receive the result you look for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM