从给定的索引中取出 numpy 一维数组的多个切片，将结果复制到二维数组中

Question

New to Python. Python 的新手。 Given in the code snippet below is a numpy 1d array called randomWalk .下面的代码片段中给出了一个名为randomWalk的 numpy 一维数组。 Given indices (which can be interpreted as start dates and end dates, both of which may vary from item to item), I want to do take multiple slices from that 1d array randomWalk and arrange the results in a 2d array of given shape.给定索引（可以解释为开始日期和结束日期，两者都可能因项目而异），我想从该一维数组randomWalk中获取多个切片，并将结果排列在给定形状的二维数组中。

I am trying to vectorize this.我正在尝试对此进行矢量化。 Was able to select the slices I wanted from the 1d array using np.r_ , but failed to store these in the format I require for the output (a 2d array with rows representing items and columns representing time from min(startDates) to max(endDates) .能够 select 使用np.r_从一维数组中获取我想要的切片，但未能以我需要的 output 格式存储这些切片（一个二维数组，其中行表示项目，列表示从min(startDates)到max(endDates) 。

Below is the (ugly) code that works.下面是有效的（丑陋的）代码。

import numpy as np

numItems = 20
numPeriods = 12

# Data
randomWalk = np.random.normal(loc = 0.0, scale = 0.05, size = (numPeriods,))
startDates = np.random.randint(low = 1, high = 5, size = numItems)
endDates = np.random.randint(low = 5, high = numPeriods + 1, size = numItems)
stochasticItems = np.random.choice([False, True], size=(numItems,), p = [0.9, 0.1])

# Result needs to be in this shape (code snippet is designed to capture that only
# a relatively small fraction of resultMatrix's elements will differ from unity) 
resultMatrix = np.ones((numItems, numPeriods))

# Desired result (obtained via brute force)
for i in range(numItems):
    if stochasticItems[i]:
        resultMatrix[
            i, startDates[i]:endDates[i]] = np.cumprod(randomWalk[startDates[i]:endDates[i]] + 1.0)

Answer 1

Inspired by @mozway 's answer , convert irregular slices into regular mask array:受@mozway 的回答启发，将不规则切片转换为规则掩码数组：

>>> # build all arrays with np.random.seed(0)
>>> x = np.arange(numPeriods)
>>> mask = (startDates[:, None] <= x) & (endDates[:, None] > x)
>>> result = np.where(mask & stochasticItems[:, None], np.where(mask, randomWalk + 1, 1).cumprod(-1), 1)
>>> np.allclose(result, resultMatrix)
True
>>> result
array([[1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.0489369 , 1.16646468, 1.2753867 ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ]])

Answer 2

If the vectorization is the goal, so it is done by Pig answer , If it is not matter (as it is mentioned by the OP in the comments --> The aim is improvement in performance ), so I suggest using numba library to accelerate the code.如果向量化是目标，那么它是由Pig answer完成的，如果没关系（正如OP在评论中提到的那样->目标是提高性能），所以我建议使用numba库来加速编码。 We can write np.cumprod equivalent numba code and accelerate it using numba no-python jit:我们可以编写np.cumprod等效的 numba 代码并使用 numba no-python jit 加速它：

@nb.njit
def nb_cumprod(arr):
    y = np.empty_like(arr)
    y[0] = arr[0]
    for i in range(1, arr.shape[0]):
        y[i] = arr[i] * y[i-1]
    return y


@nb.njit
def nb_(numItems, numPeriods, stochasticItems, startDates, endDates, randomWalk):
    resultMatrix = np.ones((numItems, numPeriods))

    for i in range(numItems):
        if stochasticItems[i]:
            resultMatrix[i, startDates[i]:endDates[i]] = nb_cumprod(randomWalk[startDates[i]:endDates[i]] + 1.0)
    return resultMatrix

This code improved the code ~10 times faster than the OP in my some benchmarks.在我的一些基准测试中，这段代码比 OP 快了~10 times 。

从给定的索引中取出 numpy 一维数组的多个切片，将结果复制到二维数组中

问题描述

2 个解决方案

解决方案1
0 2022-09-06 05:01:09

解决方案2
0 2022-09-06 08:10:07

从给定的索引中取出 numpy 一维数组的多个切片，将结果复制到二维数组中

问题描述

2 个解决方案

解决方案1 0 2022-09-06 05:01:09

解决方案2 0 2022-09-06 08:10:07

解决方案1
0 2022-09-06 05:01:09

解决方案2
0 2022-09-06 08:10:07