如何在不使用 for 的情況下讀取大型 NetCDF 數據集 - Python

Question

早上好，我在讀取 python 中包含氣象信息的大型 netCDF 文件時遇到問題，該信息必須通過 go 來組裝信息然后將其插入數據庫，但是通過 Z34D1F91FB2E514B85676 組裝信息所需的時間信息太多，我知道必須有其他方法可以更有效地執行相同的過程，目前我通過 for 循環訪問信息，在代碼下方

 content = nc.Dataset(pathFile+file)
 XLONG, XLAT = content.variables["XLONG"], content.variables["XLAT"]
 Times = content.variables["Times"]  #Horas formar b 'b
 RAINC  =  content.variables["RAINC"] #Lluvia
 Q2 = content.variables["Q2"] #Humedad especifica
 T2 = content.variables["T2"] #Temperatura
 U10 = content.variables["U10"] #Viento zonal
 V10 = content.variables["V10"] #Viento meridional
 SWDOWN = content.variables["SWDOWN"] #Radiacion incidente
 PSFC = content.variables["PSFC"] #Presion de la superficie
 SST = content.variables["SST"] #Temperatura de la superficie del mar
CLDFRA = content.variables["CLDFRA"] #Fraccion de nubes

 for c2 in range(len(XLONG[0])):
    for c3 in range(len(XLONG[0][c2])):
    position += 1  
    for hour in range(len(Times)):
        dateH = getDatetimeInit(dateFormatFile.hour) if hour == 0 else getDatetimeForHour(hour, dateFormatFile.hour)
        hourUTC = getHourUTC(hour)        

        RAINH = str(RAINC[hour][0][c2][c3])
        Q2H = str(Q2[hour][0][c2][c3])
        T2H = str(convertKelvinToCelsius(T2[hour][0][c2][c3]))
        U10H = str(U10[hour][0][c2][c3])
        V10H = str(V10[hour][0][c2][c3])
        SWDOWNH = str(SWDOWN[hour][0][c2][c3])
        PSFCH = str(PSFC[hour][0][c2][c3])
        SSTH = str(SST[hour][0][c2][c3])
        CLDFRAH = str(CLDFRA[hour][0][c2][c3] )


        rowData = [idRun, functions.IDMODEL, idTime, position, dateH.year, dateH.month, dateH.day, dateH.hour, RAINH, Q2H, T2H, U10H, V10H, SWDOWNH, PSFCH, SSTH, CLDFRAH]           
        dataProcess.append(rowData)

Answer 1

我會使用 NumPy。 讓我們假設您有帶有 2 個變量“t2”和“slp”的 netCDF。 然后，您可以使用以下代碼對數據進行矢量化：

#!//usr/bin/env ipython
# ---------------------
import numpy as np
from netCDF4 import Dataset
# ---------------------
filein = 'test.nc'
ncin = Dataset(filein);
tair = ncin.variables['t2'][:];
slp  = ncin.variables['slp'][:];
ncin.close();
# -------------------------
tairseries = np.reshape(tair,(np.size(tair),1));
slpseries =  np.reshape(slp,(np.size(slp),1));
# --------------------------
## if you want characters:
#tairseries = np.array([str(val) for val in tairseries]);
#slpseries = np.array([str(val) for val in slpseries]);
# --------------------------
rowdata = np.concatenate((tairseries,slpseries),axis=1);
# if you want characters, do this in the end:
row_asstrings = [[str(vv) for vv in val] for val in rowdata]
# ---------------------------

不過，我感覺使用字符串並不是一個好主意。 在我的示例中，從數字 arrays 到字符串的轉換需要很長時間，因此在連接之前我沒有實現它。

如果您還想要一些時間/位置信息，您可以這樣做：

#!//usr/bin/env ipython
# ---------------------
import numpy as np
from netCDF4 import Dataset
# ---------------------
filein = 'test.nc'
ncin = Dataset(filein);
xin = ncin.variables['lon'][:]
yin = ncin.variables['lat'][:]
timein = ncin.variables['time'][:]
tair = ncin.variables['t2'][:];
slp  = ncin.variables['slp'][:];
ncin.close();
# -------------------------
tairseries = np.reshape(tair,(np.size(tair),1));
slpseries =  np.reshape(slp,(np.size(slp),1));
# --------------------------
## if you want characters:
#tairseries = np.array([str(val) for val in tairseries]);
#slpseries = np.array([str(val) for val in slpseries]);
# --------------------------
rowdata = np.concatenate((tairseries,slpseries),axis=1);
# if you want characters, do this in the end:
#row_asstrings = [[str(vv) for vv in val] for val in rowdata]
# ---------------------------
# =========================================================
nx = np.size(xin);ny = np.size(yin);ntime = np.size(timein);
xm,ym = np.meshgrid(xin,yin);
xmt = np.tile(xm,(ntime,1,1));ymt = np.tile(ym,(ntime,1,1))
timem = np.tile(timein[:,np.newaxis,np.newaxis],(1,ny,nx));
xvec = np.reshape(xmt,(np.size(tair),1));yvec = np.reshape(ymt,(np.size(tair),1));timevec = np.reshape(timem,(np.size(tair),1)); # to make sure that array's size match, I am using the size of one of the variables
rowdata = np.concatenate((xvec,yvec,timevec,tairseries,slpseries),axis=1);

在任何情況下，使用可變大小 (744,150,150)，向量化 2 個變量只需要不到 2 秒的時間。

如何在不使用 for 的情況下讀取大型 NetCDF 數據集 - Python

問題描述

1 個解決方案

解決方案1
0 已采納 2021-01-10 16:52:27

如何在不使用 for 的情況下讀取大型 NetCDF 數據集 - Python

問題描述

1 個解決方案

解決方案1 0 已采納 2021-01-10 16:52:27

解決方案1
0 已采納 2021-01-10 16:52:27