I'm using latest trunk version of mahout's PFP Growth implementation on top of a hadoop cluster to determine frequent patterns in movielens dataset. In a previous step I converted the dataset to a list of transactions as the pfp growth algorithm needs that input format.
However, the output I get is unexpected
For example for item 1017 the only frequent pattern is
1017 ([100,1017, 50])
I would also expect a pattern like ([1017], X) with X >= 50 in that line.
I also testset an example input
1,2,3
1,2,3
1,3
and the output I get is
1 ([1, 3],3), ([1],3), ([1, 3, 2],2)
2 ([1, 3, 2],2)
3 ([1, 3],3), ([1, 3, 2],2)
There are missing patterns like ([1,2],2)
What is wrong?
The reason is that the FP Algorithm does not output subsets of a frequent pattern if its support is not greater. It's described here: http://www.searchworkings.org/forum/-/message_boards/view_message/396093
I need to rewrite the code for my use.
I read the paper and the code and it seems the PFP algorithm is not correct at all. I am wondering why nobody hasn't realized it.
It is so obvious if you already know about FP-Growth and just take a couple hours to read this paper and the code.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.