简体   繁体   English

Python:在列表列表中查找最频繁出现的任意长度组合

[英]Python: Finding the most frequent occurrences of combinations of any length in a list of lists

How to find most occurring combinations in a list of lists.如何在列表列表中查找出现次数最多的组合。 Combinations length can be any.组合长度可以是任意的。

So, sample data:所以,样本数据:

l = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]

Expected output:预期 output:

'mystery','horror','thriller' - 3 times
'drama','romance' - 2 times

With the help of this post , I was able to find out most occurring pairs(combination of 2), but how to extend it find combinations of any length.this post的帮助下,我能够找出出现次数最多的对(2 的组合),但是如何扩展它可以找到任意长度的组合。

EDIT: As per @CrazyChucky's comment:编辑:根据@CrazyChucky 的评论:

Sample input:样本输入:

l = [['action','mystery','horror','thriller'],
     ['drama','romance'],
     ['comedy','drama','romance'],
     ['scifi','mystery','horror','thriller'],
     ['horror','mystery','thriller'],
     ['mystery','horror']]

Expected output:预期 output:

'mystery','horror' - 4 times
'mystery','horror','thriller' - 3 times
'drama','romance' - 2 times

You can adapt the code from that question to iterate over all the possible combinations of each possible size from each sublist:您可以调整该问题的代码以迭代每个子列表中每个可能大小的所有可能组合:

from collections import Counter
from itertools import combinations

l = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]
d  = Counter()
for sub in l:
    if len(sub) < 2:
        continue
    sub.sort()
    for sz in range(2, len(sub)+1):
        for comb in combinations(sub, sz):
            d[comb] += 1

print(d.most_common())

Output: Output:

[
 (('horror', 'mystery'), 3),
 (('horror', 'thriller'), 3),
 (('mystery', 'thriller'), 3),
 (('horror', 'mystery', 'thriller'), 3),
 (('drama', 'romance'), 2),
 (('action', 'horror'), 1),
 (('action', 'mystery'), 1),
 (('action', 'thriller'), 1),
 (('action', 'horror', 'mystery'), 1),
 (('action', 'horror', 'thriller'), 1),
 (('action', 'mystery', 'thriller'), 1),
 (('action', 'horror', 'mystery', 'thriller'), 1),
 (('comedy', 'drama'), 1),
 (('comedy', 'romance'), 1),
 (('comedy', 'drama', 'romance'), 1),
 (('horror', 'scifi'), 1),
 (('mystery', 'scifi'), 1),
 (('scifi', 'thriller'), 1),
 (('horror', 'mystery', 'scifi'), 1),
 (('horror', 'scifi', 'thriller'), 1),
 (('mystery', 'scifi', 'thriller'), 1),
 (('horror', 'mystery', 'scifi', 'thriller'), 1)
]

To get just the genres which have the highest count you can iterate over the counter:要获得计数最高的类型,您可以遍历计数器:

most_frequent = [g for g, cnt in d.items() if cnt == d.most_common(1)[0][1]]

I wrote a simple code without importing any packages我写了一个简单的代码,没有导入任何包

lst = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]


def print_it_all_by_num(arr: list):
    dic = dict()
    for i in arr:
        for j in i:
            if j in dic:
                dic[j] += 1
            else:
                dic[j] = 1
    dic_out = dict()
    for i in dic:
        if dic[i] in dic_out:
            dic_out[dic[i]].append(i)
        else:
            dic_out[dic[i]] = [i]
    print(dic_out)  # out is {1: ['action', 'comedy', 'scifi'], 3: ['mystery', 'horror', 'thriller'], 2: ['drama', 'romance']}


print_it_all_by_num(lst)  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM