简体   繁体   English

Python 列表到 np.array 的计数

[英]Python list to np.array of counts

Suppose we have a vector size N=1000 and let's say we get the list [1,1,2,2,2,100]假设我们有一个向量大小N=1000 ,假设我们得到列表[1,1,2,2,2,100]

I'd like to generate an np.array (or pd.Series) of size 1000 where v[n] is the number of times n appears in the list.我想生成一个大小为 1000 的 np.array (或 pd.Series),其中v[n]n出现在列表中的次数。 In our example, v[1] = 2, v[2] = 3, v[100] = 1, v=[42] = 0在我们的例子中, v[1] = 2, v[2] = 3, v[100] = 1, v=[42] = 0

How can I do that with numpy/pandas elegantly?我怎样才能优雅地用 numpy/pandas 做到这一点?

If you have a list mylist , you can get an array of counts mycount :如果你有一个列表mylist ,你可以得到一个计数数组mycount

N = 1000
x = np.array(mylist)
mycount = np.bincount(x, minlength=N)

This sorts each element in the array into bins based on its value and quantity.这会将数组中的每个元素根据其值和数量分类到 bin 中。 You can find more information on bincount on this doc page .您可以在此文档页面上找到有关bincount的更多信息。

Python has a native method for counting occurrences called Counter which can be used without invoking numpy or pandas if desired Python 有一个本地方法来计算称为Counter的出现次数,如果需要,可以在不调用numpypandas的情况下使用

from collections import Counter
a = [1,1,2,2,2,100]
cnts = Counter(a)
print(cnts)
# Counter({2: 3, 1: 2, 100: 1})

You can convert this to a list with a list comprehension:您可以将其转换为具有列表理解的列表:

N = 100
cnts_list = [cnts.get(i, 0) for i in range(N+1)]

Use Series.value_counts with Series.reindex for add non exist values:使用Series.value_countsSeries.reindex来添加不存在的值:

a = [1,1,2,2,2,100]

N = 100
a = pd.Series(a).value_counts().reindex(range(N+1), fill_value=0)
print (a)
0      0
1      2
2      3
3      0
4      0
      ..
96     0
97     0
98     0
99     0
100    1
Length: 101, dtype: int64

You can use np.unique as well.您也可以使用np.unique

N = 1000
result = np.zeros(N)
idx, val = np.unique([1,1,2,2,2,100], return_counts=True)
result[idx] = val
print(result[:5])                                                                                                                                                                                                                                                           
>>>[0. 2. 3. 0. 0.]

more information: https://numpy.org/doc/stable/reference/generated/numpy.unique.html更多信息: https://numpy.org/doc/stable/reference/generated/numpy.unique.html

You can use Series and Group by您可以使用系列和分组方式

In[1]:

import pandas as pd
my_list = [1,1,1,2,2,2,2,3,4,8,1000,8,8,5,5,6]

my_Serie = pd.Series(my_list)
v = my_Serie.groupby(my_list).count().to_dict()
print(v)

{1: 3, 2: 4, 3: 1, 4: 1, 5: 2, 6: 1, 8: 3, 1000: 1}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM