使用 Python 从多个 netcdf 文件创建一个 4D（模型、时间、经度、纬度）netcdf 文件

Question

I am downloading climate data in netcdf format.我正在下载 netcdf 格式的气候数据。 For each variable (eg 'precipitation'), I need to merge 9 netcdfs, each belonging to an unique climate model.对于每个变量（例如“降水”），我需要合并 9 个 netcdf，每个都属于一个独特的气候 model。 Each netcdf has the same size (time, lat, lon).每个 netcdf 具有相同的大小（时间、纬度、经度）。 How can I merge 9 3D netcdfs into one 4D netcdf?如何将 9 个 3D netcdfs 合并为一个 4D netcdf？ Ultimately, I want to calculate cumulative precipitation per month.最终，我想计算每月的累积降水量。 Here's my current code:这是我当前的代码：

variables = ['pr']         
scenarios = ['historical', 'ssp245']        #options ['historical', 'ssp126', 'ssp245', 'ssp370', 'ssp585']
models = ['UKESM1-0-LL', 'MRI-ESM2-0', 'MIROC6', 'MIROC-ES2L', 'IPSL-CM6A-LR',
         'GFDL-ESM4', 'FGOALS-g3', 'CNRM-ESM2-1', 'CanESM5']


save_folder = processing_fn / 'local_climate_assessment' / f'{variable}' / 'output'
if not os.path.exists(save_folder):
    os.makedirs(save_folder)

netcdfs = []

# Create one netcdf per model by merging annual netcdfs
for variable in variables:
    for scenario in scenarios:
        for model in models:
    

            source = processing_fn / 'local_climate_assessment' / f'{variable}' / f'{scenario}' / f'{model}'
            netcdf_fn = save_folder / f'{variable}_{scenario}_{model}.nc'
            
            if not os.path.exists(netcdf_fn):
        
                gdf_model = xr.open_mfdataset(str(source / '*.nc'), combine = 'nested', concat_dim="time", use_cftime=True)
                # rename_dict = {variable, f'{variable}_{scenario}_{model}'}
                # gdf_model.rename(rename_dict, inplace = True)
                gdf_model.to_netcdf(netcdf_fn)
                print(gdf_model.attrs['cmip6_source_id'])
                netcdfs.append(gdf_model)
                
            else:
                gdf_model = xr.open_mfdataset(netcdf_fn)
                netcdfs.append(gdf_model)

# Create one netcdf per variable by merging models
ds = xr.combine_nested(netcdfs, concat_dim = "time")
print(ds)
Out[33]: 
<xarray.Dataset>
Dimensions:  (time: 246095, lat: 47, lon: 50)
Coordinates:
  * time     (time) object 1981-01-01 12:00:00 ... 2060-12-31 12:00:00
  * lat      (lat) float64 31.62 31.88 32.12 32.38 ... 42.38 42.62 42.88 43.12
  * lon      (lon) float64 234.6 234.9 235.1 235.4 ... 246.1 246.4 246.6 246.9
Data variables:
    pr       (time, lat, lon) float32 dask.array<chunksize=(360, 47, 50), meta=np.ndarray>

The above code works, but I'm creating one big 3D netcdf instead of a 4D still containing the climate model names.上面的代码有效，但我正在创建一个大的 3D netcdf 而不是仍然包含气候 model 名称的 4D。 The code below results in the following error:下面的代码导致以下错误：

a = ds.resample(time = 'M').sum()
ValueError: index must be monotonic for resampling

How to create a 4D netcdf with model names included, and resample to create monthly sum values?如何创建包含 model 名称的 4D netcdf，并重新采样以创建每月总和值？

Answer 1

I'd definitely recommend reading the xarray docs on combining data .我绝对推荐阅读有关组合数据的 xarray 文档。

The concat_dim argument to combine_nested can be a list of dimensions along which you want to concatenate your data. combine_nested的concat_dim参数可以是您想要连接数据的维度列表。 You seem to be concatenating over variable, scenario, and model, not time.您似乎在连接变量、场景和 model，而不是时间。 So passing time here and providing a 1-D list of netCDFs is creating a repeating time series with no information about your concatenation dimensions.因此，在这里消磨时间并提供一维 netCDF 列表正在创建一个重复的时间序列，而没有关于您的连接维度的信息。

Instead, explicitly nest the datasets:相反，显式嵌套数据集：

netcdfs = []
for variable in variables:
    netcdfs.append([])
    for scenario in scenarios:
        netcdfs[-1].append([])
        for model in models:
            ... # prep & read in your data
            netcdfs[-1][-1].append(gdf_model)

# use nested lists of datasets and an ordered list
# of coordinates matching the list of datasets
ds = xr.combine_nested(
    netcdfs,
    concat_dim=[
        pd.Index(variables, name="variable"),
        pd.Index(scenarios, name="sceanrio"),
        pd.Index(models, name="model"),
    ],
)

Alternatively, expand the dimensionality of each dataset first, then concat using combine_by_coords :或者，首先扩展每个数据集的维度，然后使用combine_by_coords进行连接：

netcdfs = []
for variable in variables:
    for scenario in scenarios:
        for model in models:
            ... # prep & read in your data
            # add coordinates
            gdf_model = gdf_model.expand_dims(
                variable=[variable],
                scenario=[scenario],
                model=[model],
            )

            netcdfs.append(gdf_model)

# auto-combine using your new coordinates
ds = xr.combine_by_coords(netcdfs)

使用 Python 从多个 netcdf 文件创建一个 4D（模型、时间、经度、纬度）netcdf 文件

问题描述

1 个解决方案

解决方案1
0 2022-09-06 22:10:40

使用 Python 从多个 netcdf 文件创建一个 4D（模型、时间、经度、纬度）netcdf 文件

问题描述

1 个解决方案

解决方案1 0 2022-09-06 22:10:40

解决方案1
0 2022-09-06 22:10:40