简体繁体中英

Wrong output of mahout PFPGrowth algorithm?

原文 2012-05-09 16:47:28 8 2 apache/ hadoop/ data-mining/ mahout

I'm using latest trunk version of mahout's PFP Growth implementation on top of a hadoop cluster to determine frequent patterns in movielens dataset. In a previous step I converted the dataset to a list of transactions as the pfp growth algorithm needs that input format.

However, the output I get is unexpected

For example for item 1017 the only frequent pattern is

1017 ([100,1017, 50])

I would also expect a pattern like ([1017], X) with X >= 50 in that line.

I also testset an example input

1,2,3

1,2,3

1,3

and the output I get is

1 ([1, 3],3), ([1],3), ([1, 3, 2],2)

2 ([1, 3, 2],2)

3 ([1, 3],3), ([1, 3, 2],2)

There are missing patterns like ([1,2],2)

What is wrong?

2 answers

The reason is that the FP Algorithm does not output subsets of a frequent pattern if its support is not greater. It's described here: http://www.searchworkings.org/forum/-/message_boards/view_message/396093

I need to rewrite the code for my use.

I read the paper and the code and it seems the PFP algorithm is not correct at all. I am wondering why nobody hasn't realized it.

It is so obvious if you already know about FP-Growth and just take a couple hours to read this paper and the code.

Can't read mahout output of PFPGrowth

Determine the document ID on Mahout LDA Output

Does mahout work real time or does it pre-process the data based on the algorithm rules?

Apache Load Balancing Algorithm Wrong Behaviour

Apache NiFI ExecuteStreamCommand Wrong Output

How to work with Mahout?

Vectorization in Apache Mahout

Vectors in Each Mahout Cluster

Apache Mahout Recommender not working

Classify data using mahout

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Can't read mahout output of PFPGrowth Determine the document ID on Mahout LDA Output Does mahout work real time or does it pre-process the data based on the algorithm rules? Apache Load Balancing Algorithm Wrong Behaviour Apache NiFI ExecuteStreamCommand Wrong Output How to work with Mahout? Vectorization in Apache Mahout Vectors in Each Mahout Cluster Apache Mahout Recommender not working Classify data using mahout

Related Tags

Wrong output of mahout PFPGrowth algorithm?

Question

2 answers

solution1
1 ACCPTED 2012-05-11 05:47:42

solution2
0 2017-09-20 15:17:53

Wrong output of mahout PFPGrowth algorithm?

Question

2 answers

solution1 1 ACCPTED 2012-05-11 05:47:42

solution2 0 2017-09-20 15:17:53

solution1
1 ACCPTED 2012-05-11 05:47:42

solution2
0 2017-09-20 15:17:53