合并来自多个 numpy.uniques 的计数

Question

I have multiple returns of numpy.unique(a, return_counts=True) and unfortunately do not have access to the original arrays. I want to combine these results to one array with the unique values and to one storing the respective counts.我有多个numpy.unique(a, return_counts=True)返回，不幸的是无法访问原始 arrays。我想将这些结果组合到一个具有唯一值的数组和一个存储各自计数的数组中。 I do not want to create the arrays reversely by using np.repeat() , since these data is too big for my RAM.我不想使用np.repeat()反向创建 arrays，因为这些数据对于我的 RAM 来说太大了。

I also found Python's collection.Counter but since I'm using the results as numpy-arrays, I would prefer to stay "within" numpy. (Except, you would advise me to do it?)我还找到了 Python 的collection.Counter ，但由于我将结果用作 numpy 数组，我宁愿留在 numpy“之内”。（除了，你会建议我这样做吗？）

Is there a efficient way to solve this problem?有没有有效的方法来解决这个问题？

I want something like this, without using np.repeat() :我想要这样的东西，而不使用np.repeat() ：

mmulti_unique_values = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
multi_unique_counts = np.array([[2,2,2,2],[1,2,3,1],[1,1,2,3],[1,2,2,1]])

values_ravel = multi_unique_values.ravel()
counts_ravel = multi_unique_counts.ravel()

np.unique(np.repeat(values_ravel,counts_ravel), return_counts=True)

> (array([1, 2, 3, 4]), array([5, 7, 9, 7]))

I can achieve my desired result using a for-loop, but I'm looking for a (much) faster way!我可以使用 for 循环实现我想要的结果，但我正在寻找一种（快得多）的方法！

all_unique_values, indices_ = np.unique(values_ravel, return_inverse=True)

all_unique_counts = np.zeros(all_unique_values.shape)

for count_index, unique_index in enumerate(indices_):
    all_unique_counts[unique_index] += counts_ravel[count_index]
    
(all_unique_values, all_unique_counts)
> (array([1, 2, 3, 4]), array([5., 7., 9., 7.]))

Answer 1

You can simply apply np.unique to get the array with all the unique values and get at the same time the location for each item in the sorted array.您可以简单地应用np.unique来获取具有所有唯一值的数组，并同时获取排序数组中每个项目的位置。 Then you can accumulate the number of items based on the previous index so to get the merged number of item.然后你可以根据之前的索引累加项目数，从而得到合并后的项目数。

all_unique_values, index = np.unique(multi_unique_values, return_inverse=True)
all_unique_counts= np.zeros(all_unique_values.size, np.int64)
np.add.at(all_unique_counts, index, multi_unique_counts.ravel())  # inplace
all_unique_counts

合并来自多个 numpy.uniques 的计数

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-03-29 21:42:18

合并来自多个 numpy.uniques 的计数

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-03-29 21:42:18

解决方案1
1 已采纳 2022-03-29 21:42:18