简体   繁体   中英

Wrong output of mahout PFPGrowth algorithm?

I'm using latest trunk version of mahout's PFP Growth implementation on top of a hadoop cluster to determine frequent patterns in movielens dataset. In a previous step I converted the dataset to a list of transactions as the pfp growth algorithm needs that input format.

However, the output I get is unexpected

For example for item 1017 the only frequent pattern is

1017 ([100,1017, 50])

I would also expect a pattern like ([1017], X) with X >= 50 in that line.

I also testset an example input

1,2,3

1,2,3

1,3

and the output I get is

1 ([1, 3],3), ([1],3), ([1, 3, 2],2)

2 ([1, 3, 2],2)

3 ([1, 3],3), ([1, 3, 2],2)

There are missing patterns like ([1,2],2)

What is wrong?

The reason is that the FP Algorithm does not output subsets of a frequent pattern if its support is not greater. It's described here: http://www.searchworkings.org/forum/-/message_boards/view_message/396093

I need to rewrite the code for my use.

I read the paper and the code and it seems the PFP algorithm is not correct at all. I am wondering why nobody hasn't realized it.

It is so obvious if you already know about FP-Growth and just take a couple hours to read this paper and the code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM