numpy-如何按索引计算嵌套列表中项目的出现？

Question

Hi I want to be able to count the occurrences of items from my list by indexes of a nested list. 嗨，我希望能够通过嵌套列表的索引来计算列表中项目的出现次数。

That is if my list is 那是我的清单是

keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
        'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']

and my nested list looks like: 我的嵌套列表如下所示：

[['Three' 'One' 'Ten']
 ['Three' 'Five' 'Nine']
 ['Two' 'Five' 'Three']
 ['Two' 'Three' 'Eight']
 ['One' 'Three' 'Nine']]

How many times does 'One' occur at index 0 etc for each item, is what I want to know. 我想知道每个项目在索引0等处出现“一次”的次数。

I am using numpy arrays to build list and am creating output from weighted random. 我正在使用numpy数组构建列表，并从加权随机创建输出。 I want to be able to run the test over say 1000 lists and count the index occurrences to determine how the changes I make elsewhere in my program affect the end result. 我希望能够对1000个列表进行测试，并计算索引的出现次数，以确定我在程序其他地方进行的更改如何影响最终结果。

I have found examples such as https://stackoverflow.com/a/10741692/461887 我发现了例如https://stackoverflow.com/a/10741692/461887的示例

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

But this appears not to work with nested lists. 但这似乎不适用于嵌套列表。 Also been looking under indexing in the numpy cookbook - indexing and histogram & digitize in the example list but I just can't seem to find a function that could do this. 也一直在numpy食谱中寻找索引-在示例列表中寻找索引和直方图并数字化，但我似乎找不到能做到这一点的函数。

Updated to include example data output: 更新以包括示例数据输出：

Assunming 100 deep nested lists 令人惊讶的100个深层嵌套列表

{'One': 19, 'Two': 16, 'Three': 19, 'Four': 11, 'Five': 7, 'Six': 8, 'Seven' 4, 'Eight' 3,
            'Nine' 5, 'Ten': 1, 'Eleven': 2, 'Twelve': 1, 'Thirteen': 1, 'Fourteen': 3, 'Fifteen': 0}

Or as in treddy's example 或像特雷迪的例子

array([19, 16, 19, 11, 7, 8, 4, 3, 5, 1, 2, 1, 1, 3, 0])

Answer 1

You'd better to add example output you want to get for your example, but for now looks like collections.Counter will do the job: 您最好添加要为示例获取的示例输出，但现在看起来像集合。Counter将完成此工作：

>>> data = [['Three','One','Ten'],
...  ['Three','Five','Nine'],
...  ['Two','Five','Three'],
...  ['Two','Three','Eight'],
...  ['One','Three','Nine']]
... 
>>> 
>>> from collections import Counter
>>> [Counter(x) for x in data]
[Counter({'Three': 1, 'Ten': 1, 'One': 1}), Counter({'Nine': 1, 'Five': 1, 'Three': 1}), Counter({'Five': 1, 'Two': 1, 'Three': 1}), Counter({'Eight': 1, 'Two': 1, 'Three': 1}), Counter({'Nine': 1, 'Three': 1, 'One': 1})]

update: 更新：

As you gave desired output, I think the idea for you would be - fatten the list, use Counter to count occurences, and then create dictionary (or OrderedDict if order matters for you): 当您提供所需的输出时，我认为适合您的想法是-增添列表，使用Counter计数出现的次数，然后创建字典（或OrderedDict（如果顺序对您而言很重要））：

>>> from collections import Counter, OrderedDict
>>> c = Counter(e for l in data for e in l)
>>> c
Counter({'Three': 5, 'Two': 2, 'Nine': 2, 'Five': 2, 'One': 2, 'Ten': 1, 'Eight': 1})

or if you need only first entry in each list: 或者，如果您只需要每个列表中的第一个条目：

>>> c = Counter(l[0] for l in data)
>>> c
Counter({'Three': 2, 'Two': 2, 'One': 1})

simple dictionary: 简单字典：

>>> {x:c[x] for x in keys} 
{
    'Twelve': 0, 'Seven': 0,
    'Ten': 1, 'Fourteen': 0,
    'Nine': 2, 'Six': 0
    'Three': 5, 'Two': 2,
    'Four': 0, 'Eleven': 0,
    'Five': 2, 'Thirteen': 0,
    'Eight': 1, 'One': 2, 'Fifteen': 0
}

or OrderedDict: 或OrderedDict：

>>> OrderedDict((x, c[x]) for x in keys)
OrderedDict([('One', 2), ('Two', 2), ('Three', 5), ('Four', 0), ('Five', 2), ('Six', 0), ('Seven', 0), ('Eight', 1), ('Nine', 2), ('Ten', 1), ('Eleven', 0), ('Twelve', 0), ('Thirteen', 0), ('Fourteen', 0), ('Fifteen', 0)])

And, just in case, if you don' need zeroes in your otput, you could just use Counter to get number of occurences: 而且，以防万一，如果您的otput中不需要零，则可以使用Counter来获取出现次数：

>>> c['Nine']   # Key is in the Counter, returns number of occurences
2
>>> c['Four']   # Key is not in the Counter, returns 0
0

Answer 2

The OP asked a numpy question and collections Counter and OrderDict will certainly work, but here's a numpy answer: OP提出了一个小问题，并且集合Counter和OrderDict当然可以工作，但是这里有一个小问题的答案：

In [1]: # from original posting:
In [2]: keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
...:         'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']
In [3]: data = [['Three', 'One', 'Ten'],
...:            ['Three', 'Five', 'Nine'],
...:            ['Two', 'Five', 'Three'],
...:            ['Two', 'Three', 'Eight'],
...:            ['One', 'Three', 'Nine']]
In [4]: # make it numpy
In [5]: import numpy as np
In [6]: keys = np.array(keys)
In [7]: data = np.array(data)
In [8]: # if you only want counts for column 0
In [9]: counts = np.sum(keys == data[:,[0]], axis=0)
In [10]: # view it
In [11]: zip(keys, counts)
Out[11]:
[('One', 1),
('Two', 2),
('Three', 2), ...
In [12]: # if you wanted counts for all columns (newaxis here sets-up 3D broadcasting)
In [13]: counts = np.sum(keys[:,np.newaxis,np.newaxis] == data, axis=1)
In [14]: # view it (you could use zip without pandas, this is just for looks)
In [15]: import pandas as pd
In [16]: pd.DataFrame(counts, index=keys)
Out[16]:
          0  1  2
One       1  1  0
Two       2  0  0
Three     2  2  1
Four      0  0  0
Five      0  2  0 ...

Answer 3

You are correct that numpy.bincount accepts a 1D array-like object, so a nested list or array with more than 1 dimension can't be used directly, but you can simply use numpy array slicing to select the first column of your 2D array and bin count the occurrence of each digit within the range of values in that column: 您是正确的，numpy.bincount接受一维数组状对象，因此不能直接使用维度大于1的嵌套列表或数组，但是您可以简单地使用numpy数组切片来选择2D数组的第一列和bin计算该列中值范围内每个数字的出现：

keys = numpy.arange(1,16) #don't really need to use this
two_dim_array_for_counting = numpy.array([[3,1,10],\
                                      [3,5,9],\
                                      [2,5,3],\
                                      [2,3,8],\
                                      [1,3,9]])
numpy.bincount(two_dim_array_for_counting[...,0]) #only count all rows in the first column
Out[36]: array([0, 1, 2, 2]) #this output means that the digit 0 occurs 0 times, 1 occurs once, 2 occurs twice, and three occurs twice

No digits greater than 3 occur in the first column so the output array only has 4 elements counting occurrences of 0, 1, 2, 3 digits in first column. 第一列中没有出现大于3的数字，因此输出数组仅包含4个元素，它们计算第一列中出现的0、1、2、3位数字。

numpy-如何按索引计算嵌套列表中项目的出现？

问题描述

3 个解决方案

解决方案1
2 已采纳 2013-11-23 09:03:04

update: 更新：

解决方案2
2 2013-11-23 15:50:21

解决方案3
1 2013-11-23 07:37:32

numpy-如何按索引计算嵌套列表中项目的出现？

问题描述

3 个解决方案

解决方案1 2 已采纳 2013-11-23 09:03:04

update: 更新：

解决方案2 2 2013-11-23 15:50:21

解决方案3 1 2013-11-23 07:37:32

解决方案1
2 已采纳 2013-11-23 09:03:04

解决方案2
2 2013-11-23 15:50:21

解决方案3
1 2013-11-23 07:37:32