xarray：重塑数据，拆分维

Question

I have a Dataset in xarray with the following dimensions: 我在xarray有一个具有以下尺寸的数据集：

Dimensions:      (subject: 30, session: 5, time: 45000)
Coordinates:
  * subject      (subject) object '110' '112' '114' '117' ...
  * session      (session) object 'week1' 'week2' 'week3' ...
  * time         (time) timedelta64[ns] 00:00:00 00:00:00.040000 ...

I want to split each trial (subject/session combo) into smaller time segments, for example into 3 segments of 15000 values each, The resulting dimensions may look as follows: 我想将每个试验（主题/会话组合）分成较小的时间段，例如分成3个段，每个段的15000个值，结果维可能如下所示：

(subject: 30, session: 5, segment: 3, time: 15000)

I've searched and tried a lot of things but have not succeeded, how can this be done? 我已经搜索并尝试了很多东西，但是没有成功，怎么办？

One of the things that I've been trying, that seems to be close, is creating a new MultiIndex and unstacking it. 我一直在尝试的事情之一似乎很接近，那就是创建一个新的MultiIndex并将其堆叠。

segment_data = np.repeat(range(3),len(ds.time)//3)
segment = xr.Variable(dims='time',data=segment_data)
newtime_data = np.tile(ds.time[:len(ds.time)//3],3)
newtime = xr.Variable(dims='time',data=newtime_data)
dsr = ds.assign_coords(segment=segment,newtime=newtime)
dsr = dsr.set_index(segment='segment',newtime='newtime')
dsr = dsr.stack(fragment=['segment','newtime'])

However that last line takes a huge amount of memory and seems to create a dimension fragment: len(ds.time)**2 , which doesn't seem right. 但是，最后一行占用了大量内存，并且似乎创建了一个维fragment: len(ds.time)**2 ，这似乎不正确。 I'm also no sure what I would have to do after this ( unstack('fragment') ?). 我也不确定在此之后我要做什么（ unstack('fragment') ？）。

edit: Some more attempts have brought me here: 编辑：更多尝试将我带到这里：

x = np.repeat(range(3),15000)
y = np.tile(ds.time[:len(ds.time)//3],3)
dsr = (ds.assign_coords(segment=x,time2=y)
      .set_index(fragment=['segment','time2'])
      .unstack('fragment'))

Which gives this: 这给出了：

(subject: 30, segment: 3, session: 5, time: 45000, time2: 15000)

This seems close but it's not quite there since every time2 point now has 45000 values while it should be a single value: 这似乎很近，但是还不足够，因为每个time2点现在都有45000个值，而它应该是一个值：

dsr.isel(subject=0,segment=0,session=0,time2=0)
# (time: 45000)

edit: I finally found a way to do it, see my answer. 编辑：我终于找到了一种方法，请参阅我的答案。 Futher suggestions welcome! 欢迎进一步建议！

Answer 1

First make sure you have the labels for the two new dimensions. 首先，请确保您具有两个新尺寸的标签。 In this case as follows: 在这种情况下如下：

x = range(3) # 3 segments
y = ds.time[:len(ds.time)//3] # the first 1/3rd of the time labels

Then create a pandas MultiIndex from these labels*. 然后从这些标签*创建一个pandas MultiIndex。

ind = pd.MultiIndex.from_product((x,y),names=('segment','new_time'))

Finally, replace the time index in the Dataset by this new index, and then unstack its levels to create the two required dimensions. 最后，用这个新索引替换数据集中的time索引，然后拆开其级别以创建两个必需的维。

dsr = ds.assign(time=ind).unstack('time')

You may want to use rename to rename the new dimension: 您可能要使用rename来重命名新维度：

dsr = dsr.rename({'new_time':'time'})

Resulting dimensions: 产生的尺寸：

(subject: 30, segment: 3, session: 5, time: 15000)

The only thing that's off now is the order of the dimensions (ideally segment and session should be swapped). 现在唯一不可用的是维度顺序（理想情况下，应该交换segment和session ）。 I thought transpose would help here but "although the order of dimensions on each array will change, the dataset dimensions themselves will remain in fixed (sorted) order." 我认为transpose将有助于解决问题，但“尽管每个数组的维顺序会发生变化，但数据集维本身将保持固定（排序）的顺序。” ** So I'll probably live with it like this. **所以我可能会这样住。

_{* Note you won't be able to use the name of the dimension you want to split, so we have 'new_time' here.} _{*请注意，您将无法使用要拆分的维度的名称，因此我们在此处使用'new_time' 。} _{An unnecessary limitation of assign ?} _{assign的不必要限制？}

_{** Another limitation that I can't explain.} _{**我无法解释的另一个限制。}

xarray：重塑数据，拆分维

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-03-25 15:34:58

xarray：重塑数据，拆分维

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-03-25 15:34:58

解决方案1
2 已采纳 2017-03-25 15:34:58