简体   繁体   English

NumbaPro - 对二维数组进行排序然后对相同键的条目求和的最聪明方法

[英]NumbaPro - Smartest way to sort a 2d array and then sum over entries of same key

In my program I have an array with the size of multiple million entries like this:在我的程序中,我有一个大小为数百万个条目的数组,如下所示:

arr=[(1,0.5), (4,0.2), (321, 0.01), (2, 0.042), (1, 0.01), ...]

I could instead make two arrays with the same order (instead of an array with touples) if that helps.如果有帮助,我可以改为使用相同的顺序制作两个数组(而不是带有元组的数组)。

For sorting this array I know I can use radix sort so it has this structure:为了对这个数组进行排序,我知道我可以使用基数排序,因此它具有以下结构:

arr_sorted = [(1,0.5), (1,0.01), (2,0.42), ...]

Now I want to sum over all the values from the array that have the key 1. Then all that have the key 2 etc. That should be written into a new array like this:现在我想对数组中具有键 1 的所有值求和。然后所有具有键 2 等的值应该写入一个新数组,如下所示:

arr_summed = [(1, 0.51), (2,0.42), ...]

Obviously this array would be much smaller, although still on the order of 100000 Entrys.显然这个数组会小得多,尽管仍然在 100000 个条目的数量级。 Now my question is: What's the best parallel approach to my problem in CUDA?现在我的问题是:在 CUDA 中解决我的问题的最佳并行方法是什么? I am using NumbaPro.我正在使用 NumbaPro。

Edit for clarity为清楚起见进行编辑

I would have two arrays instead of a list of tuples like this:我会有两个数组而不是像这样的元组列表:

keys = [1, 2, 5, 2, 6, 4, 4, 65, 3215, 1, .....]
values = [0.1, 0.4, 0.123, 0.01, 0.23, 0.1, 0.1, 0.4 ...]

They are initially numpy arrays that get copied to the device.它们最初是被复制到设备的 numpy 数组。

What I want is to reduce them by key and if possible set missing key values (for example if three doesn't appear in the array) to zero.我想要的是按键减少它们,如果可能的话,将缺失的键值(例如,如果三个未出现在数组中)设置为零。

So I would want it go become:所以我希望它变成:

keys = [1, 2, 3, 4, 5, 6, 7, 8, ...]
values = [0.11, 0.41, 0, 0.2, ...] # <- Summed by key

I know how big the final array will be beforehand.我事先知道最终数组有多大。

I don't know Numba, but in simple Python:我不知道 Numba,但在简单的 Python 中:

arr=[(1,0.5), (4,0.2), (321, 0.01), (2, 0.042), (1, 0.01), ...]
res = [0.0] * (indexmax + 1)
for k, v in arr:
   res[k] += v

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM