[英]Python: Finding the most frequent occurrences of combinations of any length in a list of lists
[英]Finding the most frequent occurrences of pairs in a list of lists
我有一個數據集,表示許多技術報告的作者列表。 每個報告可以由一個或多個人撰寫:
a = [
['John', 'Mark', 'Jennifer'],
['John'],
['Joe', 'Mark'],
['John', 'Anna', 'Jennifer'],
['Jennifer', 'John', 'Mark']
]
我必須找到最頻繁的人對,即過去合作最多的人:
['John', 'Jennifer'] - 3 times
['John', 'Mark'] - 2 times
['Mark', 'Jennifer'] - 2 times
etc...
如何在Python中做到這一點?
將collections.Counter
字典與itertools.combinations
:
from collections import Counter
from itertools import combinations
d = Counter()
for sub in a:
if len(a) < 2:
continue
sub.sort()
for comb in combinations(sub,2):
d[comb] += 1
print(d.most_common())
[(('Jennifer', 'John'), 3), (('John', 'Mark'), 2), (('Jennifer', 'Mark'), 2), (('Anna', 'John'), 1), (('Joe', 'Mark'), 1), (('Anna', 'Jennifer'), 1)]
most_common()
將在最常見的順序返回配對到最低,你想要的第一n
最常見的只是通過n
d.most_common(n)
import collections
import itertools
a = [
['John', 'Mark', 'Jennifer'],
['John'],
['Joe', 'Mark'],
['John', 'Anna', 'Jennifer'],
['Jennifer', 'John', 'Mark']
]
counts = collections.defaultdict(int)
for collab in a:
collab.sort()
for pair in itertools.combinations(collab, 2):
counts[pair] += 1
for pair, freq in counts.items():
print(pair, freq)
輸出:
('John', 'Mark') 2
('Jennifer', 'Mark') 2
('Anna', 'John') 1
('Jennifer', 'John') 3
('Anna', 'Jennifer') 1
('Joe', 'Mark') 1
您可以使用集合理解來創建所有數字的集合,然后使用列表理解來計算子列表中對名稱的出現:
>>> from itertools import combinations as comb
>>> all_nam={j for i in a for j in i}
>>> [[(i,j),sum({i,j}.issubset(t) for t in a)] for i,j in comb(all_nam,2)]
[[('Jennifer', 'John'), 3],
[('Jennifer', 'Joe'), 0],
[('Jennifer', 'Anna'), 1],
[('Jennifer', 'Mark'), 2],
[('John', 'Joe'), 0],
[('John', 'Anna'), 1],
[('John', 'Mark'), 2],
[('Joe', 'Anna'), 0],
[('Joe', 'Mark'), 1],
[('Anna', 'Mark'), 0]]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.