Python：在列表列表中查找最频繁出现的任意长度组合

Question

How to find most occurring combinations in a list of lists.如何在列表列表中查找出现次数最多的组合。 Combinations length can be any.组合长度可以是任意的。

So, sample data:所以，样本数据：

l = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]

Expected output:预期 output：

'mystery','horror','thriller' - 3 times
'drama','romance' - 2 times

With the help of this post , I was able to find out most occurring pairs(combination of 2), but how to extend it find combinations of any length.在this post的帮助下，我能够找出出现次数最多的对（2 的组合），但是如何扩展它可以找到任意长度的组合。

EDIT: As per @CrazyChucky's comment:编辑：根据@CrazyChucky 的评论：

Sample input:样本输入：

l = [['action','mystery','horror','thriller'],
     ['drama','romance'],
     ['comedy','drama','romance'],
     ['scifi','mystery','horror','thriller'],
     ['horror','mystery','thriller'],
     ['mystery','horror']]

Expected output:预期 output：

'mystery','horror' - 4 times
'mystery','horror','thriller' - 3 times
'drama','romance' - 2 times

Answer 1

You can adapt the code from that question to iterate over all the possible combinations of each possible size from each sublist:您可以调整该问题的代码以迭代每个子列表中每个可能大小的所有可能组合：

from collections import Counter
from itertools import combinations

l = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]
d  = Counter()
for sub in l:
    if len(sub) < 2:
        continue
    sub.sort()
    for sz in range(2, len(sub)+1):
        for comb in combinations(sub, sz):
            d[comb] += 1

print(d.most_common())

Output: Output：

[
 (('horror', 'mystery'), 3),
 (('horror', 'thriller'), 3),
 (('mystery', 'thriller'), 3),
 (('horror', 'mystery', 'thriller'), 3),
 (('drama', 'romance'), 2),
 (('action', 'horror'), 1),
 (('action', 'mystery'), 1),
 (('action', 'thriller'), 1),
 (('action', 'horror', 'mystery'), 1),
 (('action', 'horror', 'thriller'), 1),
 (('action', 'mystery', 'thriller'), 1),
 (('action', 'horror', 'mystery', 'thriller'), 1),
 (('comedy', 'drama'), 1),
 (('comedy', 'romance'), 1),
 (('comedy', 'drama', 'romance'), 1),
 (('horror', 'scifi'), 1),
 (('mystery', 'scifi'), 1),
 (('scifi', 'thriller'), 1),
 (('horror', 'mystery', 'scifi'), 1),
 (('horror', 'scifi', 'thriller'), 1),
 (('mystery', 'scifi', 'thriller'), 1),
 (('horror', 'mystery', 'scifi', 'thriller'), 1)
]

To get just the genres which have the highest count you can iterate over the counter:要获得计数最高的类型，您可以遍历计数器：

most_frequent = [g for g, cnt in d.items() if cnt == d.most_common(1)[0][1]]

Answer 2

I wrote a simple code without importing any packages我写了一个简单的代码，没有导入任何包

lst = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]


def print_it_all_by_num(arr: list):
    dic = dict()
    for i in arr:
        for j in i:
            if j in dic:
                dic[j] += 1
            else:
                dic[j] = 1
    dic_out = dict()
    for i in dic:
        if dic[i] in dic_out:
            dic_out[dic[i]].append(i)
        else:
            dic_out[dic[i]] = [i]
    print(dic_out)  # out is {1: ['action', 'comedy', 'scifi'], 3: ['mystery', 'horror', 'thriller'], 2: ['drama', 'romance']}


print_it_all_by_num(lst)

Python：在列表列表中查找最频繁出现的任意长度组合

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-12-22 05:46:05

解决方案2
0 2020-12-22 05:52:57

Python：在列表列表中查找最频繁出现的任意长度组合

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-12-22 05:46:05

解决方案2 0 2020-12-22 05:52:57

解决方案1
2 已采纳 2020-12-22 05:46:05

解决方案2
0 2020-12-22 05:52:57