简体   繁体   English

如何计算 ndarray 中某个项目的出现次数?

[英]How do I count the occurrence of a certain item in an ndarray?

How do I count the number of 0 s and 1 s in the following array?如何计算以下数组中01的数量?

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

y.count(0) gives: y.count(0)给出:

numpy.ndarray object has no attribute count numpy.ndarray object 没有属性count

Using numpy.unique :使用numpy.unique

import numpy
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
unique, counts = numpy.unique(a, return_counts=True)

>>> dict(zip(unique, counts))
{0: 7, 1: 4, 2: 1, 3: 2, 4: 1}

Non-numpy method using collections.Counter ;使用collections.Counter的非 numpy 方法

import collections, numpy
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
counter = collections.Counter(a)

>>> counter
Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})

What about using numpy.count_nonzero , something like使用numpy.count_nonzero怎么样,比如

>>> import numpy as np
>>> y = np.array([1, 2, 2, 2, 2, 0, 2, 3, 3, 3, 0, 0, 2, 2, 0])

>>> np.count_nonzero(y == 1)
1
>>> np.count_nonzero(y == 2)
7
>>> np.count_nonzero(y == 3)
3

Personally, I'd go for: (y == 0).sum() and (y == 1).sum()就个人而言,我会选择: (y == 0).sum()(y == 1).sum()

Eg例如

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
num_zeros = (y == 0).sum()
num_ones = (y == 1).sum()

For your case you could also look into numpy.bincount对于您的情况,您还可以查看numpy.bincount

In [56]: a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

In [57]: np.bincount(a)
Out[57]: array([8, 4])  #count of zeros is at index 0 : 8
                        #count of ones is at index 1 : 4
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

If you know that they are just 0 and 1 :如果您知道它们只是01

np.sum(y)

gives you the number of ones.给你个数。 np.sum(1-y) gives the zeroes. np.sum(1-y)给出零。

For slight generality, if you want to count 0 and not zero (but possibly 2 or 3):稍微笼统地说,如果您想计算0而不是 0(但可能是 2 或 3):

np.count_nonzero(y)

gives the number of nonzero.给出非零数.

But if you need something more complicated, I don't think numpy will provide a nice count option.但是如果你需要更复杂的东西,我认为 numpy 不会提供一个很好的count选项。 In that case, go to collections:在这种情况下,请转到集合:

import collections
collections.Counter(y)
> Counter({0: 8, 1: 4})

This behaves like a dict这表现得像一个字典

collections.Counter(y)[0]
> 8

Convert your array y to list l and then do l.count(1) and l.count(0)将您的数组y转换为列表l然后执行l.count(1)l.count(0)

>>> y = numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>> l = list(y)
>>> l.count(1)
4
>>> l.count(0)
8 

If you know exactly which number you're looking for, you can use the following;如果您确切知道要查找的号码,则可以使用以下内容;

lst = np.array([1,1,2,3,3,6,6,6,3,2,1])
(lst == 2).sum()

returns how many times 2 is occurred in your array.返回数组中出现 2 的次数。

Filter and use len过滤和使用len

Using len could be another option.使用len可能是另一种选择。

A = np.array([1,0,1,0,1,0,1])

Say we want the number of occurrences of 0 .假设我们想要0的出现次数。

A[A==0]  # Return the array where item is 0, array([0, 0, 0])

Now, wrap it around with len .现在,用len包裹它。

len(A[A==0])  # 3
len(A[A==1])  # 4
len(A[A==7])  # 0, because there isn't such item.

Honestly I find it easiest to convert to a pandas Series or DataFrame:老实说,我发现转换为 pandas Series 或 DataFrame 最容易:

import pandas as pd
import numpy as np

df = pd.DataFrame({'data':np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])})
print df['data'].value_counts()

Or this nice one-liner suggested by Robert Muil:或者 Robert Muil 建议的这个漂亮的单线:

pd.Series([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]).value_counts()

No one suggested to use numpy.bincount(input, minlength) with minlength = np.size(input) , but it seems to be a good solution, and definitely the fastest :没有人建议将numpy.bincount(input, minlength)minlength = np.size(input)一起使用,但这似乎是一个很好的解决方案,而且绝对是最快的:

In [1]: choices = np.random.randint(0, 100, 10000)

In [2]: %timeit [ np.sum(choices == k) for k in range(min(choices), max(choices)+1) ]
100 loops, best of 3: 2.67 ms per loop

In [3]: %timeit np.unique(choices, return_counts=True)
1000 loops, best of 3: 388 µs per loop

In [4]: %timeit np.bincount(choices, minlength=np.size(choices))
100000 loops, best of 3: 16.3 µs per loop

That's a crazy speedup between numpy.unique(x, return_counts=True) and numpy.bincount(x, minlength=np.max(x)) !这是numpy.unique(x, return_counts=True)numpy.bincount(x, minlength=np.max(x))之间的疯狂加速!

I'd use np.where:我会使用 np.where:

how_many_0 = len(np.where(a==0.)[0])
how_many_1 = len(np.where(a==1.)[0])

y.tolist().count(val)

with val 0 or 1使用 val 0 或 1

Since a python list has a native function count , converting to list before using that function is a simple solution.由于 python list 具有本机函数count ,因此在使用该函数之前转换为 list 是一个简单的解决方案。

To count the number of occurrences, you can use np.unique(array, return_counts=True) :要计算出现次数,可以使用np.unique(array, return_counts=True)

In [75]: boo = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
 
# use bool value `True` or equivalently `1`
In [77]: uniq, cnts = np.unique(boo, return_counts=1)
In [81]: uniq
Out[81]: array([0, 1])   #unique elements in input array are: 0, 1

In [82]: cnts
Out[82]: array([8, 4])   # 0 occurs 8 times, 1 occurs 4 times

If you are interested in the fastest execution, you know in advance which value(s) to look for, and your array is 1D, or you are otherwise interested in the result on the flattened array (in which case the input of the function should be np.ravel(arr) rather than just arr ), then Numba is your friend:如果您对最快的执行感兴趣,您事先知道要查找的值,并且您的数组是一维数组,或者您对展平数组的结果感兴趣(在这种情况下,函数的输入应该是np.ravel(arr)而不仅仅是arr ),那么 Numba 是你的朋友:

import numba as nb


@nb.jit
def count_nb(arr, value):
    result = 0
    for x in arr:
        if x == value:
            result += 1
    return result

or, for very large arrays where parallelization may be beneficial:或者,对于并行化可能有益的非常大的数组:

@nb.jit(parallel=True)
def count_nbp(arr, value):
    result = 0
    for i in nb.prange(arr.size):
        if arr[i] == value:
            result += 1
    return result

Benchmarking these against np.count_nonzero() (which also has a problem of creating a temporary array which may be avoided) and np.unique() -based solution将这些与np.count_nonzero() (也存在创建可以避免的临时数组的问题)和基于np.unique()的解决方案进行基准测试

import numpy as np


def count_np(arr, value):
    return np.count_nonzero(arr == value)
import numpy as np


def count_np2(arr, value):
    uniques, counts = np.unique(a, return_counts=True)
    counter = dict(zip(uniques, counts))
    return counter[value] if value in counter else 0 

for input generated with:对于生成的输入:

def gen_input(n, a=0, b=100):
    return np.random.randint(a, b, n)

the following plots are obtained (the second row of plots is a zoom on the faster approach):获得了以下图(第二行图是对更快方法的放大):

bm_full bm_zoom

Showing that Numba-based solution are noticeably faster than the NumPy counterparts, and, for very large inputs, the parallel approach is faster than the naive one.表明基于 Numba 的解决方案明显比 NumPy 解决方案快,并且对于非常大的输入,并行方法比简单的方法更快。


Full code available here . 此处提供完整代码。

Yet another simple solution might be to use numpy.count_nonzero() :另一个简单的解决方案可能是使用numpy.count_nonzero()

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y_nonzero_num = np.count_nonzero(y==1)
y_zero_num = np.count_nonzero(y==0)
y_nonzero_num
4
y_zero_num
8

Don't let the name mislead you, if you use it with the boolean just like in the example, it will do the trick.不要让名称误导您,如果您像示例中一样将它与布尔值一起使用,它会成功。

Try this:尝试这个:

a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
list(a).count(1)

take advantage of the methods offered by a Series:利用系列提供的方法:

>>> import pandas as pd
>>> y = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
>>> pd.Series(y).value_counts()
0    8
1    4
dtype: int64

You can use dictionary comprehension to create a neat one-liner.您可以使用字典理解来创建一个简洁的单行。 More about dictionary comprehension can be found here更多关于字典理解的信息可以在这里找到

>>>counts = {int(value): list(y).count(value) for value in set(y)}
>>>print(counts)
{0: 8, 1: 4}

This will create a dictionary with the values in your ndarray as keys, and the counts of the values as the values for the keys respectively.这将创建一个字典,其中 ndarray 中的值作为键,值的计数分别作为键的值。

This will work whenever you want to count occurences of a value in arrays of this format.只要您想计算此格式数组中值的出现次数,这将起作用。

It involves one more step, but a more flexible solution which would also work for 2d arrays and more complicated filters is to create a boolean mask and then use .sum() on the mask.它涉及更多步骤,但也适用于 2d 数组和更复杂的过滤器的更灵活的解决方案是创建一个布尔掩码,然后在掩码上使用 .sum()。

>>>>y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>>>mask = y == 0
>>>>mask.sum()
8

A general and simple answer would be:一个普遍而简单的答案是:

numpy.sum(MyArray==x)   # sum of a binary list of the occurence of x (=0 or 1) in MyArray

which would result into this full code as exemple这将导致这个完整的代码作为例子

import numpy
MyArray=numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])  # array we want to search in
x=0   # the value I want to count (can be iterator, in a list, etc.)
numpy.sum(MyArray==0)   # sum of a binary list of the occurence of x in MyArray

Now if MyArray is in multiple dimensions and you want to count the occurence of a distribution of values in line (= pattern hereafter)现在,如果 MyArray 是多个维度,并且您想计算行中值分布的出现(= 此后的模式)

MyArray=numpy.array([[6, 1],[4, 5],[0, 7],[5, 1],[2, 5],[1, 2],[3, 2],[0, 2],[2, 5],[5, 1],[3, 0]])
x=numpy.array([5,1])   # the value I want to count (can be iterator, in a list, etc.)
temp = numpy.ascontiguousarray(MyArray).view(numpy.dtype((numpy.void, MyArray.dtype.itemsize * MyArray.shape[1])))  # convert the 2d-array into an array of analyzable patterns
xt=numpy.ascontiguousarray(x).view(numpy.dtype((numpy.void, x.dtype.itemsize * x.shape[0])))  # convert what you search into one analyzable pattern
numpy.sum(temp==xt)  # count of the searched pattern in the list of patterns

You have a special array with only 1 and 0 here.你有一个特殊的数组,这里只有 1 和 0。 So a trick is to use所以一个技巧是使用

np.mean(x)

which gives you the percentage of 1s in your array.它为您提供数组中 1 的百分比。 Alternatively, use或者,使用

np.sum(x)
np.sum(1-x)

will give you the absolute number of 1 and 0 in your array.将为您提供数组中 1 和 0 的绝对数量。

dict(zip(*numpy.unique(y, return_counts=True)))

刚刚在这里复制了 Seppo Enarvi 的评论,这应该是一个正确的答案

This can be done easily in the following method这可以通过以下方法轻松完成

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y.tolist().count(1)

Since your ndarray contains only 0 and 1, you can use sum() to get the occurrence of 1s and len()-sum() to get the occurrence of 0s.由于您的 ndarray 仅包含 0 和 1,您可以使用 sum() 来获取 1 的出现和 len()-sum() 来获取 0 的出现。

num_of_ones = sum(array)
num_of_zeros = len(array)-sum(array)

For generic entries:对于通用条目:

x = np.array([11, 2, 3, 5, 3, 2, 16, 10, 10, 3, 11, 4, 5, 16, 3, 11, 4])
n = {i:len([j for j in np.where(x==i)[0]]) for i in set(x)}
ix = {i:[j for j in np.where(x==i)[0]] for i in set(x)}

Will output a count:将输出一个计数:

{2: 2, 3: 4, 4: 2, 5: 2, 10: 2, 11: 3, 16: 2}

And indices:和指数:

{2: [1, 5],
3: [2, 4, 9, 14],
4: [11, 16],
5: [3, 12],
10: [7, 8],
11: [0, 10, 15],
16: [6, 13]}

If you don't want to use numpy or a collections module you can use a dictionary:如果您不想使用 numpy 或集合模块,您可以使用字典:

d = dict()
a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
for item in a:
    try:
        d[item]+=1
    except KeyError:
        d[item]=1

result:结果:

>>>d
{0: 8, 1: 4}

Of course you can also use an if/else statement.当然,您也可以使用 if/else 语句。 I think the Counter function does almost the same thing but this is more transparant.我认为 Counter 函数的作用几乎相同,但这更透明。

here I have something, through which you can count the number of occurrence of a particular number: according to your code这里我有一些东西,通过它你可以计算特定数字的出现次数:根据你的代码

count_of_zero=list(y[y==0]).count(0) count_of_zero=list(y[y==0]).count(0)

print(count_of_zero)打印(count_of_zero)

// according to the match there will be boolean values and according to True value the number 0 will be return // 根据匹配会有布尔值,根据真值返回数字 0

if you are dealing with very large arrays using generators could be an option.如果您使用生成器处理非常大的数组,则可能是一种选择。 The nice thing here it that this approach works fine for both arrays and lists and you dont need any additional package.这里的好处是这种方法对数组和列表都适用,并且您不需要任何额外的包。 Additionally, you are not using that much memory.此外,您没有使用那么多内存。

my_array = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
sum(1 for val in my_array if val==0)
Out: 8

This funktion returns the number of occurences of a variable in an array:此函数返回数组中变量的出现次数:

def count(array,variable):
    number = 0
    for i in range(array.shape[0]):
        for j in range(array.shape[1]):
            if array[i,j] == variable:
                number += 1
    return number

The simplest,do comment if not necessary最简单的,没必要的评论

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
count_0, count_1 = 0, 0
for i in y_train:
    if i == 0:
        count_0 += 1
    if i == 1:
        count_1 += 1
count_0, count_1

Numpy has a module for this. Numpy 对此有一个模块。 Just a small hack.只是一个小技巧。 Put your input array as bins.将您的输入数组作为箱。

numpy.histogram(y, bins=y)

The output are 2 arrays.输出是 2 个数组。 One with the values itself, other with the corresponding frequencies.一个是值本身,另一个是相应的频率。

using numpy.count

$ a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]

$ np.count(a, 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算 Python 中字符串列表中每个项目的出现次数? - How do I count the occurrence of each item from a list in a string in Python? numpy-如何按索引计算嵌套列表中项目的出现? - numpy - how do I count the occurrence of items in nested lists by index? 如何根据 dataframe 中的条件计算字符串值的出现次数? - How do I count the occurrence of string values based on a condition in a dataframe? 如果单词在字典中,我如何计算每行中的单词出现次数 - How do I count word occurrence in each line if the word is in a dictionary 如何使用 python 计算列表中元素的出现次数? - How do I count the occurrence of elements in a list using python? 如何获取 python 中字符串出现的计数? - How do I get the count of string occurrence in python? 如何使用另一个ndarray的行索引ndarray? - How do I index an ndarray using rows of another ndarray? 如何使用 Python 计算嵌套字典 in.json 文件中某个项目的出现次数(并可能迭代)? - How can I count occurrence of an item in nested dictionaries in .json file using Python (and possibly iterate over)? 如何计算列表项的出现次数? - How do I count the occurrences of a list item? 如何在没有将轴存储在 numpy.ndarray 中的情况下创建具有多列的计数图? - How do I create a count plot with multiple columns without the axes being stored in a numpy.ndarray?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM