简体   繁体   English

从给定的索引中取出 numpy 一维数组的多个切片,将结果复制到二维数组中

[英]Taking multiple slices of numpy 1d array from given indices, copying result into 2d array

New to Python. Python 的新手。 Given in the code snippet below is a numpy 1d array called randomWalk .下面的代码片段中给出了一个名为randomWalk的 numpy 一维数组。 Given indices (which can be interpreted as start dates and end dates, both of which may vary from item to item), I want to do take multiple slices from that 1d array randomWalk and arrange the results in a 2d array of given shape.给定索引(可以解释为开始日期和结束日期,两者都可能因项目而异),我想从该一维数组randomWalk中获取多个切片,并将结果排列在给定形状的二维数组中。

I am trying to vectorize this.我正在尝试对此进行矢量化。 Was able to select the slices I wanted from the 1d array using np.r_ , but failed to store these in the format I require for the output (a 2d array with rows representing items and columns representing time from min(startDates) to max(endDates) .能够 select 使用np.r_从一维数组中获取我想要的切片,但未能以我需要的 output 格式存储这些切片(一个二维数组,其中行表示项目,列表示从min(startDates)max(endDates)

Below is the (ugly) code that works.下面是有效的(丑陋的)代码。

import numpy as np

numItems = 20
numPeriods = 12

# Data
randomWalk = np.random.normal(loc = 0.0, scale = 0.05, size = (numPeriods,))
startDates = np.random.randint(low = 1, high = 5, size = numItems)
endDates = np.random.randint(low = 5, high = numPeriods + 1, size = numItems)
stochasticItems = np.random.choice([False, True], size=(numItems,), p = [0.9, 0.1])

# Result needs to be in this shape (code snippet is designed to capture that only
# a relatively small fraction of resultMatrix's elements will differ from unity) 
resultMatrix = np.ones((numItems, numPeriods))

# Desired result (obtained via brute force)
for i in range(numItems):
    if stochasticItems[i]:
        resultMatrix[
            i, startDates[i]:endDates[i]] = np.cumprod(randomWalk[startDates[i]:endDates[i]] + 1.0)

Inspired by @mozway 's answer , convert irregular slices into regular mask array:@mozway 的回答启发,将不规则切片转换为规则掩码数组:

>>> # build all arrays with np.random.seed(0)
>>> x = np.arange(numPeriods)
>>> mask = (startDates[:, None] <= x) & (endDates[:, None] > x)
>>> result = np.where(mask & stochasticItems[:, None], np.where(mask, randomWalk + 1, 1).cumprod(-1), 1)
>>> np.allclose(result, resultMatrix)
True
>>> result
array([[1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.0489369 , 1.16646468, 1.2753867 ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ]])

If the vectorization is the goal, so it is done by Pig answer , If it is not matter (as it is mentioned by the OP in the comments --> The aim is improvement in performance ), so I suggest using numba library to accelerate the code.如果向量化是目标,那么它是由Pig answer完成的,如果没关系(正如OP在评论中提到的那样->目标是提高性能),所以我建议使用numba库来加速编码。 We can write np.cumprod equivalent numba code and accelerate it using numba no-python jit:我们可以编写np.cumprod等效的 numba 代码并使用 numba no-python jit 加速它:

@nb.njit
def nb_cumprod(arr):
    y = np.empty_like(arr)
    y[0] = arr[0]
    for i in range(1, arr.shape[0]):
        y[i] = arr[i] * y[i-1]
    return y


@nb.njit
def nb_(numItems, numPeriods, stochasticItems, startDates, endDates, randomWalk):
    resultMatrix = np.ones((numItems, numPeriods))

    for i in range(numItems):
        if stochasticItems[i]:
            resultMatrix[i, startDates[i]:endDates[i]] = nb_cumprod(randomWalk[startDates[i]:endDates[i]] + 1.0)
    return resultMatrix

This code improved the code ~10 times faster than the OP in my some benchmarks.在我的一些基准测试中,这段代码比 OP 快了~10 times

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM