简体   繁体   English

Numpy 查找二维数组中出现的次数

[英]Numpy find number of occurrences in a 2D array

Is there a numpy function to count the number of occurrences of a certain value in a 2D numpy array.有没有一个numpy function来统计二维numpy数组中某个值出现的次数。 Eg例如

np.random.random((3,3))

array([[ 0.68878371,  0.2511641 ,  0.05677177],
       [ 0.97784099,  0.96051717,  0.83723156],
       [ 0.49460617,  0.24623311,  0.86396798]])

How do I find the number of times 0.83723156 occurs in this array?如何找到0.83723156在此数组中出现的次数?

arr = np.random.random((3,3))
# find the number of elements that get really close to 1.0
condition = arr == 0.83723156
# count the elements
np.count_nonzero(condition)

The value of condition is a list of booleans representing whether each element of the array satisfied the condition. condition的值是一个布尔值列表,表示数组的每个元素是否满足条件。 np.count_nonzero counts how many nonzero elements are in the array. np.count_nonzero计算数组中有多少非零元素。 In the case of booleans it counts the number of elements with a True value. 在布尔值的情况下,它计算具有True值的元素的数量。

To be able to deal with floating point accuracy, you could do something like this instead: 为了能够处理浮点精度,你可以这样做:

condition = np.fabs(arr - 0.83723156) < 0.001

For floating point arrays np.isclose is much better option than either comparing with the exactly same element or defining a custom range. 对于浮点数组, np.isclose比完全相同的元素或定义自定义范围要好得多。

>>> a = np.array([[ 0.68878371,  0.2511641 ,  0.05677177],
                  [ 0.97784099,  0.96051717,  0.83723156],
                  [ 0.49460617,  0.24623311,  0.86396798]])

>>> np.isclose(a, 0.83723156).sum()
1

Note that real numbers are not represented exactly in a computer, that is why np.isclose will work while == doesn't: 请注意,实数并不完全在计算机中表示,这就是为什么np.isclose可以工作,而==不会:

>>> (0.1 + 0.2) == 0.3
False

Instead: 代替:

>>> np.isclose(0.1 + 0.2, 0.3)
True

To count the number of times x appears in any array, you can simply sum the boolean array that results from a == x : 要计算x在任何数组中出现的次数,您可以简单地将由a == x得到的布尔数组求和:

>>> col = numpy.arange(3)
>>> cols = numpy.tile(col, 3)
>>> (cols == 1).sum()
3

It should go without saying, but I'll say it anyway: this is not very useful with floating point numbers unless you specify a range, like so: 它应该不言而喻,但无论如何我会说:除非你指定一个范围,否则这对浮点数不是很有用,如下所示:

>>> a = numpy.random.random((3, 3))
>>> ((a > 0.5) & (a < 0.75)).sum()
2

This general principle works for all sorts of tests. 这个一般原则适用于各种测试。 For example, if you want to count the number of floating point values that are integral: 例如,如果要计算整数的浮点值的数量:

>>> a = numpy.random.random((3, 3)) * 10
>>> a
array([[ 7.33955747,  0.89195947,  4.70725211],
       [ 6.63686955,  5.98693505,  4.47567936],
       [ 1.36965745,  5.01869306,  5.89245242]])
>>> a.astype(int)
array([[7, 0, 4],
       [6, 5, 4],
       [1, 5, 5]])
>>> (a == a.astype(int)).sum()
0
>>> a[1, 1] = 8
>>> (a == a.astype(int)).sum()
1

You can also use np.isclose() as described by Imanol Luengo , depending on what your goal is. 您也可以使用Imanol Luengo所描述的np.isclose() ,具体取决于您的目标。 But often, it's more useful to know whether values are in a range than to know whether they are arbitrarily close to some arbitrary value. 但通常,知道值是否在范围内比知道它们是否任意接近某个任意值更有用。

The problem with isclose is that its default tolerance values ( rtol and atol ) are arbitrary, and the results it generates are not always obvious or easy to predict. isclose的问题在于其默认容差值( rtolatol )是任意的,并且它生成的结果并不总是显而易见或易于预测。 To deal with complex floating point arithmetic, it does even more floating point arithmetic! 为了处理复杂的浮点运算,它做了更多的浮点运算! A simple range is much easier to reason about precisely. 简单的范围更容易推理。 (This is an expression of a more general principle: first, do the simplest thing that could possibly work .) (这是一个更一般的原则的表达: 首先,做最简单的事情,可能有效 。)

Still, isclose and its cousin allclose have their uses. 仍然, isclose及其堂兄allclose有其用途。 I usually use them to see if a whole array is very similar to another whole array, which doesn't seem to be your question. 我通常使用它们来查看整个数组是否与另一个整个数组非常相似,这似乎不是你的问题。

If it may be of use to anyone: for very large 2D arrays, if you want to count how many time all elements appear within the entire array, one could flatten the array into a list and then count how many times each element appeared:如果它可能对任何人都有用:对于非常大的 2D arrays,如果你想计算所有元素在整个数组中出现的次数,可以将数组展平成一个列表,然后计算每个元素出现的次数:

from itertools import chain
import collections
from collections import Counter

#large array is called arr
flatten_arr = list(chain.from_iterable(arr))
dico_nodeid_appearence = Counter(flatten_arr)
#how may times x appeared in the arr
dico_nodeid_appearence[x]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM