简体   繁体   English

计算列表中每个唯一项的出现次数

[英]Count number of occurrences of each unique item in list of lists

I have a list of lists like the following: 我有一个类似于以下列表的列表:

listoflist = [["A", "B", "A", "C", "D"], ["Z", "A", "B", "C"], ["D", "D", "X", "Y", "Z"]]

I want to find the number of sublists that each unique value in listoflist occurs in. For example, "A" shows up in two sublists, while "D" shows up in two sublists also, even though it occurs twice in listoflist[3] . 我想找到出现在listoflist中的每个唯一值的子列表的数量。例如,“ A”出现在两个子列表中,而“ D”也出现在两个子列表中,即使它在listoflist[3]出现两次也是listoflist[3]

How can I get a dataframe which has each unique element in one column and the frequency (number of sublists each unique element shows up in)? 如何获得一个数据框,该数据框在一个列中包含每个唯一元素,并且显示频率(每个唯一元素显示在其中的子列表数)?

You can use: itertools.chain together with collections.Counter : 您可以使用: itertools.chaincollections.Counter

In [94]: import itertools as it

In [95]: from collections import Counter

In [96]: Counter(it.chain(*map(set, listoflist)))
Out[96]: Counter({'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2})

As mentioned in the comment by @Jean-François Fabre, you can also use: 如@Jean-FrançoisFabre的评论中所述,您还可以使用:

In [97]: Counter(it.chain.from_iterable(map(set, listoflist)))
Out[97]: Counter({'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2})

Essentially, it seems that you want something like 本质上,似乎您想要类似

Counter(x for xs in listoflist for x in set(xs))

Each list is converted into a set first, to exclude duplicates. 每个列表首先转换为一组,以排除重复项。 Then the sequence of sets is flatmapped and fed into the Counter . 然后将集合的序列映射并馈入Counter

Full code: 完整代码:

from collections import Counter

listoflist = [["A", "B", "A", "C", "D"], ["Z", "A", "B", "C"], ["D", "D", "X", "Y", "Z"]]

c = Counter(x for xs in listoflist for x in set(xs))

print(c)

Results in: 结果是:

# output:
# Counter({'B': 2, 'C': 2, 'Z': 2, 'D': 2, 'A': 2, 'Y': 1, 'X': 1})

Another way to do this is to use pandas: 另一种方法是使用熊猫:

import pandas as pd

df = pd.DataFrame(listoflist)
df.stack().reset_index().groupby(0)['level_0'].nunique().to_dict()

Output: 输出:

{'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM