简体   繁体   中英

Python: Generating candidate itemsets for Relative Support Apriori Algorithm

Please note: Title of this question might be ambiguous so I request other users to please edit it. I was not able to come up with a suitable title which fits this problem.

The problem discussed above is a part of an algorithm called RSAA (Relative Support Apriori Algorithm), here's the research paper link: http://dl.acm.org/citation.cfm?id=937663

Problem : I am implementing algorithms like apriori using python, and while doing so I am facing an issue where I have generate patterns (candidate itemsets) like these at each step of the algorithm.

  • At each step the length of the sublists in the main list should be incremented by 1.
  • Output of one step is going to be the input for the next step.
  • Sublists in the main list can occur in any order, and numbers inside sublists can occur in any order.

Here's the example:

Input:

input = [[5, 3], [5, 4], [5, 6], [7, 6]]

Output should be:

output = [[5,3,4], [5,3,6], [4,5,6], [5,6,7]]

Each sublist of output list (^) must have only 3 items (example: [5,3,4]) .

The approach to solve this problem should be generic , because in the next step:

Input:

input = [[5,3,4], [5,3,6], [4,5,6], [5,6,7]]

Output:

output = [[5,3,4,6], [4,5,6,7]]

Each sublist of output list (^) must have only 4 items.

( [5,3,4,6] is formed by joining [5,3,4] and [5,3,6]. We can't join [5,3,4] and [5,6,7] because doing so would create [5,3,4,6,7] which will be of length = 5 )

I think your requirement is included in apriori. I wrote a blog about the algorithm, but unfortunately in chinese. Here is the link http://www.zealseeker.com/archives/apriori-algorithm-python/
Here is the snippets (also hosted in chinese)

has_infrequent_subset and apriori_gen may be the two functions you want.

If the code is useful for you, comment my answer and I'll be glade to continue help you.


update

It is easy to get the intersection and difference of two sequence in python.

a = set([5, 6])
b = set([6, 7])
c = a & b # get the itersection
if len(c) == len(a) - 1: 
  return a | b # their union

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM