[英]counting occurrences of an int in a list quickly
I have a function that generates 1024 randomish ints within some known range (0 to N). 我有一个函数,可以在某个已知范围(0到N)内生成1024个随机整数。 I want to count the number of occurrences of each number I see. 我想计算我看到的每个数字的出现次数。
normally I would do something like: 通常我会做类似的事情:
a = np.zeros(N+1)
for number in get_numbers():
a[number] += 1
The problem I have is this is somewhat slow since all the accumulation is done in in python and not in a nice numpy function. 我的问题是这有点慢,因为所有累加都是在python中完成的,而不是在一个不错的numpy函数中完成的。 Normally I wouldn't care about speed but this is done in an inner loop and the time really adds up. 通常我不会在乎速度,但是这是在一个内部循环中完成的,时间真的很累了。
I'd rather do something like 我宁愿做类似的事情
a = np.zeros(N+1)
nums = get_numbers():
a[nums] = a[nums]+1
but if there are duplicates in nums (and there could be, though the number of repeats ought to be low) then the indices with duplicates only gets counted once. 但是如果有重复项(以重复数为单位)(并且可能存在,尽管重复次数应该很少),那么具有重复项的索引只会被计数一次。 Is there a faster way to do this in numpy? 有没有一种更快的方法来在numpy中做到这一点?
Use np.unique
with return_counts=True
使用np.unique
和return_counts=True
a = np.array(list('aaaabbbccd'))
u, c = np.unique(a, return_counts=True)
np.column_stack([u, c])
array([['a', '4'],
['b', '3'],
['c', '2'],
['d', '1']],
dtype='<U21')
You can use: 您可以使用:
np.bincount(get_numbers(), minlength=N+1)
Example : 范例 :
N = 5
numbers = np.random.randint(N, size=10)
numbers
# array([2, 0, 4, 0, 0, 4, 2, 1, 2, 0])
Results using bincount
: 使用bincount
结果:
np.bincount(numbers, minlength=N+1)
# array([4, 1, 3, 0, 2, 0])
Results using for
loop: 使用for
循环的结果:
a = np.zeros(N+1)
for number in numbers:
a[number] += 1
a
# array([ 4., 1., 3., 0., 2., 0.])
Timing : 时间 :
N = 20
numbers = np.random.randint(N, size=1000)
def for_loop():
a = np.zeros(N+1)
for number in numbers:
a[number] += 1
return a
def np_unique():
a = np.zeros(N+1)
u, c = np.unique(numbers, return_counts=True)
a[u] = c
return a
%timeit np.bincount(numbers, minlength=N+1)
# The slowest run took 6.46 times longer than the fastest. This could mean that an intermediate result is being cached.
# 100000 loops, best of 3: 2.59 µs per loop
%timeit for_loop()
# 1000 loops, best of 3: 426 µs per loop
%timeit np_unique()
# The slowest run took 4.08 times longer than the fastest. This could mean that an intermediate result is being cached.
# 10000 loops, best of 3: 30.6 µs per loop
Checking results : 检查结果 :
(np_unique() == np.bincount(numbers, minlength=N+1)).all()
# True
(for_loop() == np.bincount(numbers, minlength=N+1)).all()
# True
You can also use Counter
from collections
module 您也可以使用Counter
来自collections
Counter
模块
import numpy as np
from collections import Counter
a = np.random.randint(0, 10, 20)
c = Counter(a)
list(c.items())
You can use scipy's itemfreq
. 您可以使用scipy的itemfreq
。 If you don't want to rely on scipy then define one of your own 如果您不想依靠scipy,请定义自己的一个
def itemfreq(a):
items, inv = np.unique(a, return_inverse=True)
freq = np.bincount(inv)
return np.array([items, freq]).T
To count number of occurrences an array you would do 要计算一个数组的出现次数
In [0]: a = np.random.randint(0,100,20)
Out[0]: array([5, 8, 4, 2, 6, 8, 2, 4, 1, 3, 9, 6, 9, 2, 0, 5, 8, 8, 8, 0])
In [1]: itemfreq(a)
Out[1]:
array([[0, 2],
[1, 1],
[2, 3],
[3, 1],
[4, 2],
[5, 2],
[6, 2],
[8, 5],
[9, 2]])
Since numpy 1.9
there is a builtin return keyword for unique
function: 从numpy 1.9
有一个内置的return关键字用于unique
函数:
In [2]: np.unique(a,return_counts=True)
Out[2]: (array([0, 1, 2, 3, 4, 5, 6, 8, 9]), array([2, 1, 3, 1, 2, 2, 2, 5, 2]))
Note the numpy.unique
returns sorted values. 注意numpy.unique
返回排序后的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.