简体   繁体   English

Python:为相对支持先验算法生成候选项目集

[英]Python: Generating candidate itemsets for Relative Support Apriori Algorithm

Please note: Title of this question might be ambiguous so I request other users to please edit it. 请注意:该问题的标题可能不明确,因此我要求其他用户对其进行编辑。 I was not able to come up with a suitable title which fits this problem. 我无法提出适合此问题的合适标题。

The problem discussed above is a part of an algorithm called RSAA (Relative Support Apriori Algorithm), here's the research paper link: http://dl.acm.org/citation.cfm?id=937663 上面讨论的问题是称为RSAA(相对支持先验算法)的算法的一部分,这是研究论文链接: http ://dl.acm.org/citation.cfm?id=937663

Problem : I am implementing algorithms like apriori using python, and while doing so I am facing an issue where I have generate patterns (candidate itemsets) like these at each step of the algorithm. 问题 :我正在使用python实现类似apriori的算法,而这样做的时候我面临的一个问题是,在算法的每个步骤中,我都会生成类似这些的模式(候选项目集)。

  • At each step the length of the sublists in the main list should be incremented by 1. 在每一步骤中,主列表中子列表的长度应增加1。
  • Output of one step is going to be the input for the next step. 第一步的输出将成为下一步的输入。
  • Sublists in the main list can occur in any order, and numbers inside sublists can occur in any order. 主列表中的子列表可以以任何顺序出现,子列表中的数字可以以任何顺序出现。

Here's the example: 这是示例:

Input: 输入:

input = [[5, 3], [5, 4], [5, 6], [7, 6]]

Output should be: 输出应为:

output = [[5,3,4], [5,3,6], [4,5,6], [5,6,7]]

Each sublist of output list (^) must have only 3 items (example: [5,3,4]) . 输出列表(^)的每个子列表必须只有3个项目(例如:[5,3,4])。

The approach to solve this problem should be generic , because in the next step: 解决此问题的方法应该是通用的 ,因为在下一步中:

Input: 输入:

input = [[5,3,4], [5,3,6], [4,5,6], [5,6,7]]

Output: 输出:

output = [[5,3,4,6], [4,5,6,7]]

Each sublist of output list (^) must have only 4 items. 输出列表(^)的每个子列表必须只有4个项目。

( [5,3,4,6] is formed by joining [5,3,4] and [5,3,6]. We can't join [5,3,4] and [5,6,7] because doing so would create [5,3,4,6,7] which will be of length = 5 ) ([5,3,4,6]是通过连接[5,3,4]和[5,3,6]形成的。我们无法将[5,3,4]和[5,6,7]连接在一起因为这样做会创建[5,3,4,6,7],其长度为5)

I think your requirement is included in apriori. 我认为您的要求已包含在先验中。 I wrote a blog about the algorithm, but unfortunately in chinese. 我写了一篇关于算法的博客,但不幸的是中文。 Here is the link http://www.zealseeker.com/archives/apriori-algorithm-python/ 这是链接http://www.zealseeker.com/archives/apriori-algorithm-python/
Here is the snippets (also hosted in chinese) 以下是摘要 (也以中文托管)

has_infrequent_subset and apriori_gen may be the two functions you want. has_infrequent_subsetapriori_gen可能是你想要的两个功能。

If the code is useful for you, comment my answer and I'll be glade to continue help you. 如果代码对您有用,请评论我的答案,我们很高兴继续为您提供帮助。


update 更新

It is easy to get the intersection and difference of two sequence in python. 在python中很容易得到两个序列的交集和差。

a = set([5, 6])
b = set([6, 7])
c = a & b # get the itersection
if len(c) == len(a) - 1: 
  return a | b # their union

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM