简体   繁体   English

如何拆分numpy数组并在拆分数组上执行某些操作[Python]

[英]how to split numpy array and perform certain actions on split arrays [Python]

Only part of this question has been asked before ( [1] [2] ) , which explained how to split numpy arrays. [1] [2] )之前仅询问过一部分问题,该部分说明了如何拆分numpy数组。 I am quite new in Python. 我对Python很陌生。 I have an array containing 262144 items and want to split it in small arrays of a length of 512, sort them individually and sum up their first five values but I am unsure how beyond this line : 我有一个包含262144个项目的数组,想将其拆分为长度为512的小数组,分别对它们进行排序并求和它们的前五个值,但是我不确定超出这一行的范围:

np.array_split(vector, 512)

How do I call and analyse each array ? 如何调用和分析每个数组? Would it be good idea to continue to use numpy array or should I revert back and use dictionary instead ? 继续使用numpy数组还是一个好主意,还是应该还原并使用Dictionary呢?

Splitting as such won't be an efficient solution, instead we could reshape, which effectively creates subarrays as rows of a 2D array. 这样的拆分将不是一种有效的解决方案,相反,我们可以进行整形,从而有效地将子数组创建为2D数组的行。 These would be views into the input array, so no additional memory requirement there. 这些将是输入数组的视图,因此在那里没有额外的内存需求。 Then, we would get argsort indices and select first five indices per row and finally sum those up for the desired output. 然后,我们将获得argsort索引,并选择每行的前五个索引,最后将这些总和求和以获得所需的输出。

Thus, we would have an implementation like so - 因此,我们将有一个这样的实现-

N = 512 # Number of elements in each split array
M = 5   # Number of elements in each subarray for sorting and summing

b = a.reshape(-1,N)
out = b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)

Step-by-step sample run - 分步示例运行-

In [217]: a   # Input array
Out[217]: array([45, 19, 71, 53, 20, 33, 31, 20, 41, 19, 38, 31, 86, 34])

In [218]: N = 7 # 512 for original case, 7 for sample

In [219]: M = 5

# Reshape into M rows 2D array
In [220]: b = a.reshape(-1,N)

In [224]: b
Out[224]: 
array([[45, 19, 71, 53, 20, 33, 31],
       [20, 41, 19, 38, 31, 86, 34]])

# Get argsort indices per row
In [225]: b.argsort(1)
Out[225]: 
array([[1, 4, 6, 5, 0, 3, 2],
       [2, 0, 4, 6, 3, 1, 5]])

# Select first M ones
In [226]: b.argsort(1)[:,:M]
Out[226]: 
array([[1, 4, 6, 5, 0],
       [2, 0, 4, 6, 3]])

# Use fancy-indexing to select those M ones per row
In [227]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]]
Out[227]: 
array([[19, 20, 31, 33, 45],
       [19, 20, 31, 34, 38]])

# Finally sum along each row
In [228]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Out[228]: array([148, 142])

Performance boost with np.argpartition - 使用np.argpartition提高性能-

out = b[np.arange(b.shape[0])[:,None], np.argpartition(b,M,axis=1)[:,:M]].sum(1)

Runtime test - 运行时测试-

In [236]: a = np.random.randint(11,99,(512*512))

In [237]: N = 512

In [238]: M = 5

In [239]: b = a.reshape(-1,N)

In [240]: %timeit b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
100 loops, best of 3: 14.2 ms per loop

In [241]: %timeit b[np.arange(b.shape[0])[:,None], \
                np.argpartition(b,M,axis=1)[:,:M]].sum(1)
100 loops, best of 3: 3.57 ms per loop

A more detailed version of doing what you want 做您想做的更详细的版本

import numpy as np
from numpy.testing.utils import assert_array_equal

vector = np.random.rand(262144)

splits = np.array_split(vector, 512)

sums = []
for split in splits:
   # sort it
   split.sort()
   # sublist
   subSplit = split[:5]
   #build sum
   splitSum = sum(subSplit)
   # add to new list
   sums.append(splitSum)

print np.array(sums).shape

Same output as @Divakar 's solution 与@Divakar的解决方案输出相同

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM