从不同大小的较小数组构造单个numpy数组

Question

I have an array of values, x . 我有一个值数组， x 。 Given 'start' and 'stop' indices, I need to construct an array y using sub-arrays of x . 给定'start'和'stop'索引，我需要使用x的子数组构造一个数组y 。

import numpy as np
x = np.arange(20)
start = np.array([2, 8, 15])
stop = np.array([5, 10, 20])
nsubarray = len(start)

Where I would like y to be: 当我想Y中：

y = array([ 2,  3,  4,  8,  9, 15, 16, 17, 18, 19])

(In practice the arrays I am using are much larger). （实际上我使用的数组要大得多）。

One way to construct y is using a list comprehension, but the list needs to be flattened afterwards: 构造y的一种方法是使用列表推导，但之后需要将列表展平：

import itertools as it
y = [x[start[i]:stop[i]] for i in range(nsubarray)]
y = np.fromiter(it.chain.from_iterable(y), dtype=int)

I found that it is actually faster to use a for-loop: 我发现使用for循环实际上更快：

y = np.empty(sum(stop - start), dtype = int)
a = 0
for i in range(nsubarray):
    b = a + stop[i] - start[i]
    y[a:b] = x[start[i]:stop[i]]
    a = b

I was wondering if anyone knows of a way that I can optimize this? 我想知道是否有人知道我可以优化这种方式？ Thank you very much! 非常感谢你！

EDIT 编辑

The following tests all of the times: 以下测试始终如一：

import numpy as np
import numpy.random as rd
import itertools as it


def get_chunks(arr, start, stop):
    rng = stop - start
    rng = rng[rng!=0]      #Need to add this in case of zero sized ranges
    np.cumsum(rng, out=rng)
    inds = np.ones(rng[-1], dtype=np.int)
    inds[rng[:-1]] = start[1:]-stop[:-1]+1
    inds[0] = start[0]
    np.cumsum(inds, out=inds)
    return np.take(arr, inds)


def for_loop(arr, start, stop):
    y = np.empty(sum(stop - start), dtype = int)
    a = 0
    for i in range(nsubarray):
        b = a + stop[i] - start[i]
        y[a:b] = arr[start[i]:stop[i]]
        a = b
    return y

xmax = 1E6
nsubarray = 100000
x = np.arange(xmax)
start = rd.randint(0, xmax - 10, nsubarray)
stop = start + 10

Which results in: 结果如下：

In [379]: %timeit np.hstack([x[i:j] for i,j in it.izip(start, stop)])
1 loops, best of 3: 410 ms per loop

In [380]: %timeit for_loop(x, start, stop)
1 loops, best of 3: 281 ms per loop

In [381]: %timeit np.concatenate([x[i:j] for i,j in it.izip(start, stop)])
10 loops, best of 3: 97.8 ms per loop

In [382]: %timeit get_chunks(x, start, stop)
100 loops, best of 3: 16.6 ms per loop

Answer 1

This is a bit complicated, but quite fast. 这有点复杂，但速度很快。 Basically what we do is create the index list based off vector addition and the use np.take instead of any python loops: 基本上我们所做的是创建基于向量加法的索引列表，并使用np.take而不是任何python循环：

def get_chunks(arr, start, stop):
     rng = stop - start
     rng = rng[rng!=0]      #Need to add this in case of zero sized ranges
     np.cumsum(rng, out=rng)
     inds = np.ones(rng[-1], dtype=np.int)
     inds[rng[:-1]] = start[1:]-stop[:-1]+1
     inds[0] = start[0]
     np.cumsum(inds, out=inds)
     return np.take(arr, inds)

Check that it is returning the correct result: 检查它是否返回正确的结果：

xmax = 1E6
nsubarray = 100000
x = np.arange(xmax)
start = np.random.randint(0, xmax - 10, nsubarray)
stop = start + np.random.randint(1, 10, nsubarray)

old = np.concatenate([x[b:e] for b, e in izip(start, stop)])
new = get_chunks(x, start, stop)
np.allclose(old,new)
True

Some timings: 一些时间：

%timeit np.hstack([x[i:j] for i,j in zip(start, stop)])
1 loops, best of 3: 354 ms per loop

%timeit np.concatenate([x[b:e] for b, e in izip(start, stop)])
10 loops, best of 3: 119 ms per loop

%timeit get_chunks(x, start, stop)
100 loops, best of 3: 7.59 ms per loop

Answer 2

也许使用zip ， np.arange和np.hstack ：

np.hstack([np.arange(i, j) for i,j in zip(start, stop)])

Answer 3

This is almost 3 times faster than the loop for me, almost all the time difference comes from replacing fromiter with concatenate: 对于我来说这几乎比循环快3倍，几乎所有的时差都来自用连接替换fromiter：

import numpy as np
from itertools import izip

y = [x[b:e] for b, e in izip(start, stop)]
y = np.concatenate(y)

Answer 4

Would it be okay using slices instead of np.arrays? 是否可以使用切片而不是np.arrays？

import numpy as np
x = np.arange(10)
start = slice(2, 8)
stop = slice(5, 10)

print np.concatenate((x[start], x[stop]))

从不同大小的较小数组构造单个numpy数组

问题描述

4 个解决方案

解决方案1
3 已采纳 2014-03-11 17:51:10

解决方案2
2 2014-03-11 13:54:38

解决方案3
1 2014-03-11 15:37:04

解决方案4
0 2014-03-11 13:25:18

从不同大小的较小数组构造单个numpy数组

问题描述

4 个解决方案

解决方案1 3 已采纳 2014-03-11 17:51:10

解决方案2 2 2014-03-11 13:54:38

解决方案3 1 2014-03-11 15:37:04

解决方案4 0 2014-03-11 13:25:18

解决方案1
3 已采纳 2014-03-11 17:51:10

解决方案2
2 2014-03-11 13:54:38

解决方案3
1 2014-03-11 15:37:04

解决方案4
0 2014-03-11 13:25:18