简体   繁体   English

在列表列表中查找最频繁出现的配对

[英]Finding the most frequent occurrences of pairs in a list of lists

I've a dataset that denotes the list of authors of many technical reports. 我有一个数据集,表示许多技术报告的作者列表。 Each report can be authored by one or multiple people: 每个报告可以由一个或多个人撰写:

a = [
['John', 'Mark', 'Jennifer'],
['John'],
['Joe', 'Mark'],
['John', 'Anna', 'Jennifer'],
['Jennifer', 'John', 'Mark']
]

I've to find the most frequent pairs, that is, people that had most collaborations in the past: 我必须找到最频繁的人对,即过去合作最多的人:

['John', 'Jennifer'] - 3 times
['John', 'Mark'] - 2 times
['Mark', 'Jennifer'] - 2 times
etc...

How to do this in Python? 如何在Python中做到这一点?

Use a collections.Counter dict with itertools.combinations : collections.Counter字典与itertools.combinations

from collections import Counter
from itertools import combinations

d  = Counter()
for sub in a:
    if len(a) < 2:
        continue
    sub.sort()
    for comb in combinations(sub,2):
        d[comb] += 1

print(d.most_common())
[(('Jennifer', 'John'), 3), (('John', 'Mark'), 2), (('Jennifer', 'Mark'), 2), (('Anna', 'John'), 1), (('Joe', 'Mark'), 1), (('Anna', 'Jennifer'), 1)]

most_common() will return the pairings in order of most common to least, of you want the first n most common just pass n d.most_common(n) most_common()将在最常见的顺序返回配对到最低,你想要的第一n最常见的只是通过n d.most_common(n)

import collections
import itertools

a = [
['John', 'Mark', 'Jennifer'],
['John'],
['Joe', 'Mark'],
['John', 'Anna', 'Jennifer'],
['Jennifer', 'John', 'Mark']
]


counts = collections.defaultdict(int)
for collab in a:
    collab.sort()
    for pair in itertools.combinations(collab, 2):
        counts[pair] += 1

for pair, freq in counts.items():
    print(pair, freq)

Output: 输出:

('John', 'Mark') 2
('Jennifer', 'Mark') 2
('Anna', 'John') 1
('Jennifer', 'John') 3
('Anna', 'Jennifer') 1
('Joe', 'Mark') 1

You can use a set comprehension to create a set of all numbers then use a list comprehension to count the occurrence of the pair names in your sub list : 您可以使用集合理解来创建所有数字的集合,然后使用列表理解来计算子列表中对名称的出现:

>>> from itertools import combinations as comb
>>> all_nam={j for i in a for j in i}
>>> [[(i,j),sum({i,j}.issubset(t) for t in a)] for i,j in comb(all_nam,2)]

[[('Jennifer', 'John'), 3], 
 [('Jennifer', 'Joe'), 0], 
 [('Jennifer', 'Anna'), 1], 
 [('Jennifer', 'Mark'), 2], 
 [('John', 'Joe'), 0], 
 [('John', 'Anna'), 1], 
 [('John', 'Mark'), 2], 
 [('Joe', 'Anna'), 0], 
 [('Joe', 'Mark'), 1], 
 [('Anna', 'Mark'), 0]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:在列表列表中查找最频繁出现的任意长度组合 - Python: Finding the most frequent occurrences of combinations of any length in a list of lists 在列表列表中查找最长最频繁的子集 - Finding the longest most frequent subset in list of lists 列表中最常见的单词,其中包含列表 - most frequent words in list which contains lists 在花车列表中查找最常见的事件 - Finding most frequent occurrence in list of floats 比较两个列表并找到最喜欢的对 - comparing two lists and finding most preferred pairs 有效地计算频繁对的出现次数 - Counting the occurrences of frequent pairs efficiently 给定一个字符串列表列表,找到最频繁的一对字符串,第二个最频繁的对,.....,然后是最频繁的字符串三元组,等等 - Given a list of lists of strings, find most frequent pair of strings, second most frequent pair, ....., then most frequent triplet of strings, etc 在数据框中找到最频繁的对 - find most frequent pairs in a dataframe 如何从具有最频繁元组的“元组对列表”中创建“新的元组对列表”? - How to make 'a new list of tuple pairs' from 'a list of tuple pairs' with the most frequent tuples? 在列表列表中查找前 N 个最频繁的数字序列 - Find Top N Most Frequent Sequence of Numbers in List of Lists
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM