计算python中多维数组中数组的出现次数

Question

我有以下类型的数组：

a = array([[1,1,1],
           [1,1,1],
           [1,1,1],
           [2,2,2],
           [2,2,2],
           [2,2,2],
           [3,3,0],
           [3,3,0],
           [3,3,0]])

我想计算每种类型数组的出现次数，例如

[1,1,1]:3, [2,2,2]:3, and [3,3,0]: 3

我怎么能在python中实现这一点？ 是否可以不使用 for 循环并计入字典？ 它必须很快，并且应该少于 0.1 秒左右。 我查看了 Counter、numpy bincount 等。但是，这些是针对单个元素而不是数组。

谢谢。

Answer 1

如果你不介意映射到元组只是为了获得计数，你可以使用一个 Counter dict，它在我的机器上使用 python3 在28.5 µs内运行，它远低于你的阈值：

In [5]: timeit Counter(map(tuple, a))
10000 loops, best of 3: 28.5 µs per loop

In [6]: c = Counter(map(tuple, a))

In [7]: c
Out[7]: Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

Answer 2

collections.Counter可以方便地做到这一点，几乎就像给出的例子一样。

>>> from collections import Counter
>>> c = Counter()
>>> for x in a:
...   c[tuple(x)] += 1
...
>>> c
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

这会将每个子列表转换为一个元组，它可以是字典中的键，因为它们是不可变的。 列表是可变的，因此不能用作字典键。

为什么要避免使用 for 循环？

与@padraic-cunningham 更酷的答案类似：

>>> Counter(tuple(x) for x in a)
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})
>>> Counter(map(tuple, a))
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

Answer 3

您可以使用np.ravel_multi_index将这些元素用作二维索引将这些行转换为一维数组。 然后，使用np.unique为我们提供每个唯一行的开始位置，并且还有一个可选参数return_counts为我们提供计数。 因此，实现看起来像这样 -

def unique_rows_counts(a):

    # Calculate linear indices using rows from a
    lidx = np.ravel_multi_index(a.T,a.max(0)+1 )

    # Get the unique indices and their counts
    _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)

    # return the unique groups from a and their respective counts
    return a[unq_idx], counts

样品运行 -

In [64]: a
Out[64]: 
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2],
       [2, 2, 2],
       [3, 3, 0],
       [3, 3, 0],
       [3, 3, 0]])

In [65]: unqrows, counts = unique_rows_counts(a)

In [66]: unqrows
Out[66]: 
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 0]])
In [67]: counts
Out[67]: array([3, 3, 3])

基准测试

假设您可以使用 numpy 数组或集合作为输出，则可以对迄今为止提供的解决方案进行基准测试，如下所示 -

函数定义：

import numpy as np
from collections import Counter

def unique_rows_counts(a):
    lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
    _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
    return a[unq_idx], counts

def map_Counter(a):
    return Counter(map(tuple, a))    

def forloop_Counter(a):      
    c = Counter()
    for x in a:
        c[tuple(x)] += 1
    return c

时间：

In [53]: a = np.random.randint(0,4,(10000,5))

In [54]: %timeit map_Counter(a)
10 loops, best of 3: 31.7 ms per loop

In [55]: %timeit forloop_Counter(a)
10 loops, best of 3: 45.4 ms per loop

In [56]: %timeit unique_rows_counts(a)
1000 loops, best of 3: 1.72 ms per loop

Answer 4

numpy_indexed包（免责声明：我是它的作者）包含用于此类操作的高效矢量化功能：

import numpy_indexed as npi
unique_rows, row_count = npi.count(a, axis=0)

请注意，这适用于任何维度或数据类型的数组。

Answer 5

由于numpy-1.13.0 ， np.unique可以与axis参数一起使用：

>>> np.unique(a, axis=0, return_counts=True)

(array([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 0]]), array([3, 3, 3]))

计算python中多维数组中数组的出现次数

问题描述

5 个解决方案

解决方案1
2 2015-10-20 11:24:58

解决方案2
2 2015-10-20 11:31:11

解决方案3
2 2015-10-20 11:40:02

基准测试

解决方案4
1 2016-10-05 06:24:14

解决方案5
1 2017-10-19 21:05:33

计算python中多维数组中数组的出现次数

问题描述

5 个解决方案

解决方案1 2 2015-10-20 11:24:58

解决方案2 2 2015-10-20 11:31:11

解决方案3 2 2015-10-20 11:40:02

基准测试

解决方案4 1 2016-10-05 06:24:14

解决方案5 1 2017-10-19 21:05:33

解决方案1
2 2015-10-20 11:24:58

解决方案2
2 2015-10-20 11:31:11

解决方案3
2 2015-10-20 11:40:02

解决方案4
1 2016-10-05 06:24:14

解决方案5
1 2017-10-19 21:05:33