简体   繁体   English

Python Numpy - 创建二维数组,其中长度基于一维数组

[英]Python Numpy - Create 2d array where length is based on 1D array

Sorry for confusing title, but not sure how to make it more concise.抱歉标题混淆了,但不知道如何使它更简洁。 Here's my requirements:这是我的要求:

arr1 = np.array([3,5,9,1])
arr2 = ?(arr1)

arr2 would then be:然后 arr2 将是:

[
[0,1,2,0,0,0,0,0,0],
[0,1,2,3,4,0,0,0,0],
[0,1,2,3,4,5,6,7,8],
[0,0,0,0,0,0,0,0,0]
]

It doesn't need to vary based on the max, the shape is known in advance.它不需要根据最大值而变化,形状是预先知道的。 So to start I've been able to get a shape of zeros:所以开始我已经能够得到一个零的形状:

arr2 = np.zeros((len(arr1),max_len))

And then of course I could do a for loop over arr1 like this:然后当然我可以像这样在 arr1 上做一个 for 循环:

for i, element in enumerate(arr1):
    arr2[i,0:element] = np.arange(element)

but that would likely take a long time and both dimensions here are rather large (arr1 is a few million rows, max_len is around 500).但这可能需要很长时间,而且这里的两个维度都相当大(arr1 是几百万行,max_len 大约是 500)。 Is there a clean optimized way to do this in numpy?在 numpy 中是否有一种干净优化的方法来执行此操作?

Building on a 'padding' idea posted by @Divakar some years ago:基于几年前@Divakar 发布的“填充”想法:

In [161]: res = np.arange(9)[None,:].repeat(4,0)
In [162]: res[res>=arr1[:,None]] = 0
In [163]: res
Out[163]: 
array([[0, 1, 2, 0, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 5, 6, 7, 8],
       [0, 0, 0, 0, 0, 0, 0, 0, 0]])

I am adding a slight variation on @hpaulj's answer because you mentioned that max_len is around 500 and you have millions of rows.我在@hpaulj 的答案上添加了一些细微的变化,因为您提到max_len大约为500并且您有数百万行。 In this case, you can precompute a 500 by 500 matrix containing all possible rows and index into it using arr1 :在这种情况下,您可以预先计算一个 500 x 500 的矩阵,其中包含所有可能的行并使用arr1对其进行索引:

import numpy as np
np.random.seed(0)

max_len = 500
arr = np.random.randint(0, max_len, size=10**5)

# generate all unique rows first, then index
# can be faster if max_len << len(arr)
# 53 ms
template = np.tril(np.arange(max_len)[None,:].repeat(max_len,0), k=-1)
res = template[arr,:]

# 173 ms
res1 = np.arange(max_len)[None,:].repeat(arr.size,0)
res1[res1>=arr[:,None]] = 0

assert (res == res1).all()

Try this -尝试这个 -

import numpy as np
import itertools

l = map(range, arr1)
arr2 = np.column_stack((itertools.zip_longest(*l, fillvalue=0)))
print(arr2)
array([[0, 1, 2, 0, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 5, 6, 7, 8],
       [0, 0, 0, 0, 0, 0, 0, 0, 0]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM