簡體   English   中英

如何通過折疊坐標重塑 xarray 數據集

[英]How to reshape xarray dataset by collapsing coordinate

我目前有一個數據集,當用 xarray 打開時包含三個坐標x, y, band 波段坐標有 4 個不同時間間隔的溫度和露點,這意味着總共有 8 個波段。 有沒有辦法重塑它,以便我可以擁有x, y, band, time ,使得波段坐標現在只有長度 2,時間坐標長度為 4?

我想我可以添加一個名為time的新坐標,然后添加波段但是

ds = ds.assign_coords(time=[1,2,3,4])

返回ValueError: cannot add coordinates with new dimensions to a DataArray

您可以將“波段”坐標重新分配給MultiIndex

In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])

In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
   ...:     [
   ...:         [1, 1, 1, 1, 2, 2, 2, 2],
   ...:         pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
   ...:     ],
   ...:     names=['band_stacked', 'time'],
   ...: )

In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
         8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
        [3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
         4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
        [7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
         7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
        [5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
         6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
       [[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
         2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
        [9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
         6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
        [6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
         5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
        [7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
         5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
  * band          (band) MultiIndex
  - band_stacked  (band) int64 1 1 1 1 2 2 2 2
  - time          (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y

然后你可以通過unstacking來擴展維度:

In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
          5.23808010e-01],
         [8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
          1.54739786e-02]],
...
        [[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
          6.23766131e-02],
         [5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
          2.44385584e-01]]]])
Coordinates:
  * band     (band) int64 1 2
  * time     (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y

另一個更手動的選擇是在 numpy 中重塑並創建一個新的 DataArray。 請注意,對於較大的數組,此手動整形快得多:

In [8]: reshaped = xr.DataArray(
   ...:     da.data.reshape((4, 4, 2, 4)),
   ...:     dims=['x', 'y', 'band', 'time'],
   ...:     coords={
   ...:         'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
   ...:         'band': [1, 2],
   ...:     },
   ...: )

請注意,如果您的數據是分塊的(並且假設您希望保持這種方式),那么您的選擇會更加有限 - 請參閱 dask docs on reshaping dask arrays 第一種(MultiIndexing unstack)方法確實適用於 dask 數組,只要數組沒有沿未堆疊維度進行分塊。 有關示例,請參見此問題

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM