简体   繁体   English

Python:如何找到最频繁的元素组合?

[英]Python: How to find most frequent combination of elements?

A machine provides fault codes which are provided in a pandas dataframe.机器提供熊猫数据帧中提供的故障代码。 id identifies the machine, code is the fault code: id标识机器, code为故障代码:

df = pd.DataFrame({
    "id": [1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4],
    "code": [1,2,5,8,9,2,3,5,6,1,2,3,4,5,6,7],
})

在此处输入图片说明

Reading example: Machine 1 generated 5 codes: 1,2,5,8 and 9.阅读示例:机器 1 生成了 5 个代码:1、2、5、8 和 9。

I want to find out which code combinations are most frequent across all machines.我想找出所有机器上最常见的代码组合。 The result for the example would be something like [2] (3x), [2,5] (3x), [3,5] (2x) and so on.该示例的结果将类似于[2] (3x)、 [2,5] (3x)、 [3,5] (2x) 等。

How can I achive this?我怎样才能做到这一点? As there is a lot of data, I'm looking for a efficient solution.由于有大量数据,我正在寻找有效的解决方案。

Here are two other ways to represent the data (in case that makes the calculation easier):以下是表示数据的另外两种方法(以防计算更容易):

pd.crosstab(df.id, df.code)

在此处输入图片说明

df.groupby("id")["code"].apply(list)

在此处输入图片说明

Use custom function all_subsets , then flatten values by Series.explode and last use Series.value_counts :使用自定义函数all_subsets ,然后按Series.explode压平值,最后使用Series.value_counts

from itertools import chain, combinations

#https://stackoverflow.com/a/5898031
#only converted to list and removed empty tuples by range(1,...
def all_subsets(ss):
    return list(chain(*map(lambda x: combinations(ss, x), range(1, len(ss)+1))))

s = df.groupby('id')['code'].apply(all_subsets).explode().value_counts()
print (s)
(2,)            3
(2, 5)          3
(5,)            3
(1, 2)          2
(3, 6)          2
               ..
(1, 5, 8)       1
(9,)            1
(1, 3, 4, 6)    1
(5, 8, 9)       1
(4, 6)          1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM