如何在使用 Python 下載之前對氣候數據進行子集化？

Question

我的典型工作流程是下載大型數據集（netcdf），然后按單個緯度/經度（網格點）對它們進行子集化。 但是，我經常只需要特定變量的單個網格點，例如氣溫/降水量，並且希望能夠在下載之前有效地對大型數據集（例如 CMIP6）進行子集化，以便下載量很小。 到目前為止，我已經嘗試過 esgf-pyclient，但是，提取單個網格點的變量（對於 1850 - 2100 年，~91,675 天/數據行）可能需要一個多小時。 這種緩慢的速度破壞了在下載之前進行子集化的目的。 互聯網不是問題，因為我的下載速度（以太網）> 1Gbps。 如果有人有任何建議或替代工作流程，將不勝感激！

我用於 esgf-pyclient 的代碼：

from pyesgf.search import SearchConnection
import xarray as xr
import numpy as np

conn = SearchConnection('https://esgf-data.dkrz.de/esg-search', distrib=True)

ctx = conn.new_context(
    product = 'input',
    project = 'ISIMIP3b',
    # model = 'GFDL-ESM4',
    experiment='historical',
    variable='tasAdjust', #, tasminAdjust, tasmaxAdjust, prAdjust'
    time_frequency='day',
    data_node='esg.pik-potsdam.de'
    )
ctx.hit_count

result = ctx.search()[0]
result.dataset_id
files = result.file_context().search()
    
ds = xr.open_dataset(files[0].opendap_url).sel(lat=32.298583, lon=-97.78538710, method="nearest")

所需的 output 將是所需網格點（緯度/經度）的 91,675 行、單列/數據向量。

Answer 1

這似乎工作得更快：

import xarray as xr
    
folder = 'https://esg.pik-potsdam.de/thredds/dodsC/isimip_dataroot/isimip3b/input/clim_atm_sim/W5E5-ISIMIP3BASD2-5-0/MRI-ESM2-0/ssp370/tasAdjust/daily/v20210512/mri-esm2-0_r1i1p1f1_w5e5_ssp370_tasAdjust_global_daily_'
        remote_data1 = xr.open_dataset(folder + '2091_2100.nc',decode_times=False).isel(lat=31, lon=-99)
        remote_data2 = xr.open_dataset(folder + '2081_2090.nc',decode_times=False).isel(lat=31, lon=-99)
        
        ds_all = xr.concat([remote_data1, remote_data2], 
        dim = 'time',join='override',data_vars='minimal', 
        coords='minimal',compat='override')

如何在使用 Python 下載之前對氣候數據進行子集化？

問題描述

1 個解決方案

解決方案1
0 2021-06-07 12:27:13

如何在使用 Python 下載之前對氣候數據進行子集化？

問題描述

1 個解決方案

解決方案1 0 2021-06-07 12:27:13

解決方案1
0 2021-06-07 12:27:13