简体   繁体   English

快速计数列表中int的出现

[英]counting occurrences of an int in a list quickly

I have a function that generates 1024 randomish ints within some known range (0 to N). 我有一个函数,可以在某个已知范围(0到N)内生成1024个随机整数。 I want to count the number of occurrences of each number I see. 我想计算我看到的每个数字的出现次数。

normally I would do something like: 通常我会做类似的事情:

a = np.zeros(N+1)
for number in get_numbers():
  a[number] += 1

The problem I have is this is somewhat slow since all the accumulation is done in in python and not in a nice numpy function. 我的问题是这有点慢,因为所有累加都是在python中完成的,而不是在一个不错的numpy函数中完成的。 Normally I wouldn't care about speed but this is done in an inner loop and the time really adds up. 通常我不会在乎速度,但是这是在一个内部循环中完成的,时间真的很累了。

I'd rather do something like 我宁愿做类似的事情

a = np.zeros(N+1)
nums = get_numbers():
a[nums] = a[nums]+1

but if there are duplicates in nums (and there could be, though the number of repeats ought to be low) then the indices with duplicates only gets counted once. 但是如果有重复项(以重复数为单位)(并且可能存在,尽管重复次数应该很少),那么具有重复项的索引只会被计数一次。 Is there a faster way to do this in numpy? 有没有一种更快的方法来在numpy中做到这一点?

Use np.unique with return_counts=True 使用np.uniquereturn_counts=True

a = np.array(list('aaaabbbccd'))

u, c = np.unique(a, return_counts=True)

np.column_stack([u, c])

array([['a', '4'],
       ['b', '3'],
       ['c', '2'],
       ['d', '1']], 
      dtype='<U21')

You can use: 您可以使用:

np.bincount(get_numbers(), minlength=N+1)

Example : 范例

N = 5
numbers = np.random.randint(N, size=10)
numbers
# array([2, 0, 4, 0, 0, 4, 2, 1, 2, 0])

Results using bincount : 使用bincount结果:

np.bincount(numbers, minlength=N+1)
# array([4, 1, 3, 0, 2, 0])

Results using for loop: 使用for循环的结果:

a = np.zeros(N+1)
for number in numbers:
    a[number] += 1

a
# array([ 4.,  1.,  3.,  0.,  2.,  0.])

Timing : 时间

N = 20
numbers = np.random.randint(N, size=1000)

def for_loop():
    a = np.zeros(N+1)
    for number in numbers:
        a[number] += 1
    return a

def np_unique():
    a = np.zeros(N+1)
    u, c = np.unique(numbers, return_counts=True)
    a[u] = c
    return a

%timeit np.bincount(numbers, minlength=N+1)
# The slowest run took 6.46 times longer than the fastest. This could mean that an intermediate result is being cached.
# 100000 loops, best of 3: 2.59 µs per loop

%timeit for_loop()
# 1000 loops, best of 3: 426 µs per loop

%timeit np_unique()
# The slowest run took 4.08 times longer than the fastest. This could mean that an intermediate result is being cached.
# 10000 loops, best of 3: 30.6 µs per loop

Checking results : 检查结果

(np_unique() == np.bincount(numbers, minlength=N+1)).all()
# True

(for_loop() == np.bincount(numbers, minlength=N+1)).all()
# True

You can also use Counter from collections module 您也可以使用Counter来自collections Counter模块

import numpy as np
from collections import Counter

a = np.random.randint(0, 10, 20)
c = Counter(a)
list(c.items())

You can use scipy's itemfreq . 您可以使用scipy的itemfreq If you don't want to rely on scipy then define one of your own 如果您不想依靠scipy,请定义自己的一个

def itemfreq(a):
    items, inv = np.unique(a, return_inverse=True)
    freq = np.bincount(inv)
    return np.array([items, freq]).T

To count number of occurrences an array you would do 要计算一个数组的出现次数

In [0]: a = np.random.randint(0,100,20)
Out[0]: array([5, 8, 4, 2, 6, 8, 2, 4, 1, 3, 9, 6, 9, 2, 0, 5, 8, 8, 8, 0])
In [1]: itemfreq(a)
Out[1]: 
array([[0, 2],
   [1, 1],
   [2, 3],
   [3, 1],
   [4, 2],
   [5, 2],
   [6, 2],
   [8, 5],
   [9, 2]])

Since numpy 1.9 there is a builtin return keyword for unique function: 从numpy 1.9有一个内置的return关键字用于unique函数:

In [2]: np.unique(a,return_counts=True)
Out[2]: (array([0, 1, 2, 3, 4, 5, 6, 8, 9]), array([2, 1, 3, 1, 2, 2, 2, 5, 2]))

Note the numpy.unique returns sorted values. 注意numpy.unique返回排序后的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM