在 Python 中重塑和组合来自 netCDF 的数据

Question

我目前正在使用 Python 中的 xarray 读取带有各种 3 小时温度 (t2m) 数据的 netCDF 文件。 数据的格式为（时间：2920，纬度：189，经度：521）或（2920,189,521），表示一年的数据。 我有 30 个这些文件，每个 2GB。

longitude (longitude) float32         -170.0 -169.8 ... -40.25 -40.0
latitude  (latitude)  float32         82.0 81.75 81.5 ... 35.5 35.25 35.0
time      (time)      datetime64[ns]  1979-01-01T01:00:00 ... 1979-12-...

我想将这些数据重塑为可以输入 scikit-learn 的格式

sklearn.model_selection.train_test_split

即我想为每个文件/年生成以下 DataFrame：

index   time                  lat   lon       t2m
0       1979-01-01T00:00:00   35    -170      270
1       1979-01-01T00:00:00   35    -169.75   269
2       1979-01-01T00:00:00   35    -169.5    271
...
n-1     1979-12-31T21:00:00   82    -40.25    241
n       1979-12-31T21:00:00   82    -40       244

请注意，在移动到下一个纬度值之前，我们将有 521 lat=35 行。 在我们通过所有 189 个纬度值之后，我们然后 go 到下一个时间步并重复直到完成。

我认为有一种方法可以通过融合和重塑 xarray ds 的某种组合来实现我想要的，但我还没有找到任何可行的方法。 任何意见，将不胜感激。

Answer 1

这应该可以通过 xarray 的内置方法来实现，如下所示。 这里的命令可能比您需要的多。 将 xarray 数据集转换为数据帧时要注意的一件事是，如果坐标有“边界”，它可以重复值，但下面的代码应该处理这个问题。

df = (ds
      # convert to dataframe
      .to_dataframe()
      # convert time and lon/lat to columns
      .reset_index()
      # only select what you want, in case there are bnds etc. in the data
      .loc[:,["time", "lon", "lat", "t2m"]]
      # remove duplicates that could be introduced by bnds
      .drop_duplicates()
      # add an index
      .reset_index()
      )

在 Python 中重塑和组合来自 netCDF 的数据

问题描述

1 个解决方案

解决方案1
0 2022-08-26 12:28:28

在 Python 中重塑和组合来自 netCDF 的数据

问题描述

1 个解决方案

解决方案1 0 2022-08-26 12:28:28

解决方案1
0 2022-08-26 12:28:28