简体   繁体   中英

Calculate monthly mean from daily data for each year

I have seen many answers how to calculate the monthly mean from daily data across multiple years.

But what I want to do is to calculate the monthly mean from daily data for each year in my xarray separately. So, I want to end up with a mean for Jan 2020, Feb 2020 ... Dec 2024 for each lon/lat gridpoint.

My xarray has the dimensions Frozen({'time': 1827, 'lon': 180, 'lat': 90}) I tried using var_resampled = var_diff.resample(time='1M').mean() but this calcualtes the mean across all years (ie mean for Jan 2020-2024).

I also tried

    def mon_mean(x):
        return x.groupby('time.month').mean('time')

    # group by year, then apply the function:
    var_diff_mon = var_diff.groupby('time.year').apply(mon_mean)

This seems to do what I want but I end up with different dimensions (ie "month" and "year" instead of the original "time" dimension).

Is there a different way to calculate the monthly mean from daily data for each year separately or is there a way that the code using groupby above retains the same time dimension as before just with year and month now?

PS I also tried "cdo monmean" but as far as I understand this also just gives mean the monthly mean across all years.

Thanks!

Solution I found a way using

    def mon_mean(x):
        return x.groupby('time.month').mean('time')

    # group by year, then apply the function:
    var_diff_mon = var_diff.groupby('time.year').apply(mon_mean)

and then using

var_diff_mon.stack(time=("year", "month"))

to get my original time dimension back

Is var_diff.resample(time='M') (or time='MS' ) doing what you expect ?

Let's create a toy dataset like yours:

import numpy as np
import pandas as pd
import xarray as xr

dims = ('time', 'lat', 'lon')
time = pd.date_range("2021-01-01T00", "2023-12-31T23", freq="H")
lat = [0, 1]
lon = [0, 1]
coords = (time, lat, lon)

ds = xr.DataArray(data=np.random.randn(len(time), len(lat), len(lon)), coords=coords, dims=dims).rename("my_var")
ds = ds.to_dataset()
ds

toy_ds

Let's resample it:

ds.resample(time="MS").mean()

toy_ds_resampled

The dataset has now 36 time steps, associated with the 36 months which are in the original dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM