简体   繁体   English

xarray 等效于 np.reshape

[英]xarray equivalent of np.reshape

I have a 3d array (10x10x3) which, for some reason, is saved as a 2d xr.DataArray (100x3).我有一个 3d 数组(10x10x3),由于某种原因,它被保存为 2d xr.DataArray(100x3)。 It looks a bit like this:它看起来有点像这样:

data = xr.DataArray(np.random.randn(100, 3),
                    dims=('ct', 'x'),
                    coords={'ct': range(100)})

c = [x%10 for x in range(100)]
t = [1234+x//10 for x in range(100)]

c and t are the coordinates that are bundled together in ct. c 和 t 是在 ct 中捆绑在一起的坐标。

In the past I have solved the issue of separating the two dimension as follows:在过去,我已经解决了分离两个维度的问题,如下所示:

t_x_c,x = data.shape
nc = 10
data = np.reshape(data.values,(t_x_c//nc,nc, x))

But this requires a number of assumptions in the data structure that may not be true in the near future (eg c and t may not be as regular as in my example).但这需要数据结构中的一些假设,这些假设在不久的将来可能不成立(例如 c 和 t 可能不像我的示例中那样规则)。

I have managed to assign c and t as additional coordinates to the array:我设法将 c 和 t 作为附加坐标分配给数组:

data2 = data.assign_coords(
    coords={"c": ("ct", c),
            "t": ("ct", t),
},)

but I would like to promote them to dimensions of the array.但我想将它们提升为数组的维度。 How would I do that?我该怎么做?

You want to use a combination of .set_index() and .unstack() methods.您想结合使用.set_index().unstack()方法。

Let's break it up.让我们打破它。

First, I create the dummy array at the stage where "c" and "t" are already coordinates:首先,我在“c”和“t”已经是坐标的阶段创建虚拟数组:

c, t = [arr.flatten() for arr in np.meshgrid(range(10), range(1234, 1234+10))]

da = xr.DataArray( 
    np.random.randn(100, 3), 
    dims=('ct', 'x'), 
    coords={ 
        'c': ('ct', c), 
        't': ('ct', t) 
    }
)

Then, use set_index() to create a MultiIndex combining "c" and "t" coordinates:然后,使用set_index()创建一个结合“c”和“t”坐标的MultiIndex

>>> da.set_index(ct=("c", "t"))                                                                  
<xarray.DataArray (ct: 100, x: 3)>
[...]
Coordinates:
  * ct       (ct) MultiIndex
  - c        (ct) int64 0 1 2 3 4 5 6 7 8 9 0 1 2 ... 
  - t        (ct) int64 1234 1234 1234 1234 1234 ...
Dimensions without coordinates: x

Then, use unstack() to make the "c" and "t" levels of the "ct" MultiIndex be dimensions:然后,使用unstack()使 "ct" MultiIndex 的 "c" 和 "t" 级别成为维度:

>>> da.set_index(ct=("c", "t")).unstack("ct")
<xarray.DataArray (x: 3, c: 10, t: 10)>
Coordinates:
  * c        (c) int64 0 1 2 3 4 5 6 7 8 9
  * t        (t) int64 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243
Dimensions without coordinates: x

>>> da.set_index(ct=("c", "t")).unstack("ct").dims
('x', 'c', 't')

However, as you can see, .unstack() is putting unstacked dimensions last.但是,如您所见, .unstack()将未堆叠的尺寸放在最后。 So you may eventually want to transpose:所以你最终可能想要转置:

>>> da.set_index(ct=("c", "t")).unstack("ct").transpose("c", "t", "x").dims                      
('c', 't', 'x')

One alternative is generating c and t coordinates with shape 100 as you started to do and create a MultiIndex from here, however, this should not be necessary.一种替代方法是生成ct与形状100的坐标,就像您开始做的那样,并从这里创建一个 MultiIndex,但是,这应该不是必需的。 Providing only the desired coordinate values for c and t (thus lenghts 10 and 10 respectively in this case) should be enough.只为ct提供所需的坐标值(在这种情况下,长度分别为 10 和 10)就足够了。 This answer will provide two already available alternatives in other SO answers and GitHub issues.该答案将在其他 SO 答案和 GitHub 问题中提供两个已经可用的替代方案。 The relevant code is included in the answer but for details on the implementations the original source should be consulted.相关代码包含在答案中,但有关实现的详细信息,应查阅原始来源。

The answer in this other question gives an example of reshaping using pure xarray methods with the following code:这个其他问题的答案给出了一个使用纯 xarray 方法进行整形的示例,代码如下:

reshaped_ds = ds.assign_coords(
    c=np.arange(10), t=np.arange(1234, 1244)
).stack(
    aux_dim=("c", "t")
).reset_index(
    "ct", drop=True
).rename(
    ct="aux_dim"
).unstack("aux_dim")

Note that this only works with datasets and would therefore require ds = data.to_dataset(name="aux_name") .请注意,这只适用于数据集,因此需要ds = data.to_dataset(name="aux_name") After reshaping it can be converted to DataArray again with ds.aux_name .重塑后,可以使用ds.aux_name再次将其转换为 DataArray。

Another alternative is to generate the multiindex with pandas instead of having xarray create it with assign_coords + stack , as shown in this github issue .另一种选择是使用 pandas 生成多索引,而不是让 xarray 使用assign_coords + stack创建它,如此github 问题所示。 This alternative is tailored to DataArrays and it even integrates the transposing to make sure the reshaped dimensions preserve the original order.这种替代方案是为 DataArrays 量身定制的,它甚至集成了转置以确保重新调整后的维度保持原始顺序。 For completeness, here is the code proposed in said issue to reshape DataArrays:为了完整起见,以下是上述问题中提出的重塑 DataArrays 的代码:

def xr_reshape(A, dim, newdims, coords):
    """ Reshape DataArray A to convert its dimension dim into sub-dimensions given by
    newdims and the corresponding coords.
    Example: Ar = xr_reshape(A, 'time', ['year', 'month'], [(2017, 2018), np.arange(12)]) """


    # Create a pandas MultiIndex from these labels
    ind = pd.MultiIndex.from_product(coords, names=newdims)

    # Replace the time index in the DataArray by this new index,
    A1 = A.copy()

    A1.coords[dim] = ind

    # Convert multiindex to individual dims using DataArray.unstack().
    # This changes dimension order! The new dimensions are at the end.
    A1 = A1.unstack(dim)

    # Permute to restore dimensions
    i = A.dims.index(dim)
    dims = list(A1.dims)

    for d in newdims[::-1]:
        dims.insert(i, d)

    for d in newdims:
        _ = dims.pop(-1)


    return A1.transpose(*dims)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM