简体   繁体   English

使用 Python 从多个 netcdf 文件创建一个 4D(模型、时间、经度、纬度)netcdf 文件

[英]Create one 4D (model, time, lon, lat) netcdf file from multiple netcdf files using Python

I am downloading climate data in netcdf format.我正在下载 netcdf 格式的气候数据。 For each variable (eg 'precipitation'), I need to merge 9 netcdfs, each belonging to an unique climate model.对于每个变量(例如“降水”),我需要合并 9 个 netcdf,每个都属于一个独特的气候 model。 Each netcdf has the same size (time, lat, lon).每个 netcdf 具有相同的大小(时间、纬度、经度)。 How can I merge 9 3D netcdfs into one 4D netcdf?如何将 9 个 3D netcdfs 合并为一个 4D netcdf? Ultimately, I want to calculate cumulative precipitation per month.最终,我想计算每月的累积降水量。 Here's my current code:这是我当前的代码:

variables = ['pr']         
scenarios = ['historical', 'ssp245']        #options ['historical', 'ssp126', 'ssp245', 'ssp370', 'ssp585']
models = ['UKESM1-0-LL', 'MRI-ESM2-0', 'MIROC6', 'MIROC-ES2L', 'IPSL-CM6A-LR',
         'GFDL-ESM4', 'FGOALS-g3', 'CNRM-ESM2-1', 'CanESM5']


save_folder = processing_fn / 'local_climate_assessment' / f'{variable}' / 'output'
if not os.path.exists(save_folder):
    os.makedirs(save_folder)

netcdfs = []

# Create one netcdf per model by merging annual netcdfs
for variable in variables:
    for scenario in scenarios:
        for model in models:
    

            source = processing_fn / 'local_climate_assessment' / f'{variable}' / f'{scenario}' / f'{model}'
            netcdf_fn = save_folder / f'{variable}_{scenario}_{model}.nc'
            
            if not os.path.exists(netcdf_fn):
        
                gdf_model = xr.open_mfdataset(str(source / '*.nc'), combine = 'nested', concat_dim="time", use_cftime=True)
                # rename_dict = {variable, f'{variable}_{scenario}_{model}'}
                # gdf_model.rename(rename_dict, inplace = True)
                gdf_model.to_netcdf(netcdf_fn)
                print(gdf_model.attrs['cmip6_source_id'])
                netcdfs.append(gdf_model)
                
            else:
                gdf_model = xr.open_mfdataset(netcdf_fn)
                netcdfs.append(gdf_model)

# Create one netcdf per variable by merging models
ds = xr.combine_nested(netcdfs, concat_dim = "time")
print(ds)
Out[33]: 
<xarray.Dataset>
Dimensions:  (time: 246095, lat: 47, lon: 50)
Coordinates:
  * time     (time) object 1981-01-01 12:00:00 ... 2060-12-31 12:00:00
  * lat      (lat) float64 31.62 31.88 32.12 32.38 ... 42.38 42.62 42.88 43.12
  * lon      (lon) float64 234.6 234.9 235.1 235.4 ... 246.1 246.4 246.6 246.9
Data variables:
    pr       (time, lat, lon) float32 dask.array<chunksize=(360, 47, 50), meta=np.ndarray>

The above code works, but I'm creating one big 3D netcdf instead of a 4D still containing the climate model names.上面的代码有效,但我正在创建一个大的 3D netcdf 而不是仍然包含气候 model 名称的 4D。 The code below results in the following error:下面的代码导致以下错误:

a = ds.resample(time = 'M').sum()
ValueError: index must be monotonic for resampling

How to create a 4D netcdf with model names included, and resample to create monthly sum values?如何创建包含 model 名称的 4D netcdf,并重新采样以创建每月总和值?

I'd definitely recommend reading the xarray docs on combining data .我绝对推荐阅读有关组合数据的 xarray 文档。

The concat_dim argument to combine_nested can be a list of dimensions along which you want to concatenate your data. combine_nestedconcat_dim参数可以是您想要连接数据的维度列表。 You seem to be concatenating over variable, scenario, and model, not time.您似乎在连接变量、场景和 model,而不是时间。 So passing time here and providing a 1-D list of netCDFs is creating a repeating time series with no information about your concatenation dimensions.因此,在这里消磨时间并提供一维 netCDF 列表正在创建一个重复的时间序列,而没有关于您的连接维度的信息。

Instead, explicitly nest the datasets:相反,显式嵌套数据集:

netcdfs = []
for variable in variables:
    netcdfs.append([])
    for scenario in scenarios:
        netcdfs[-1].append([])
        for model in models:
            ... # prep & read in your data
            netcdfs[-1][-1].append(gdf_model)

# use nested lists of datasets and an ordered list
# of coordinates matching the list of datasets
ds = xr.combine_nested(
    netcdfs,
    concat_dim=[
        pd.Index(variables, name="variable"),
        pd.Index(scenarios, name="sceanrio"),
        pd.Index(models, name="model"),
    ],
)

Alternatively, expand the dimensionality of each dataset first, then concat using combine_by_coords :或者,首先扩展每个数据集的维度,然后使用combine_by_coords进行连接:

netcdfs = []
for variable in variables:
    for scenario in scenarios:
        for model in models:
            ... # prep & read in your data
            # add coordinates
            gdf_model = gdf_model.expand_dims(
                variable=[variable],
                scenario=[scenario],
                model=[model],
            )

            netcdfs.append(gdf_model)

# auto-combine using your new coordinates
ds = xr.combine_by_coords(netcdfs)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM