简体   繁体   中英

How to calculate Confidence from Support in java

Right now I am working on a program that takes a list of users who have rated movies and calculates the support for all movies. I give my program a maximum number of movies I want to calculate, a support minimum, and a confidence minimum.

Currently my program calculates the support for all single movies and prints those that meet the support minimum to a file with the support value.

It then continues from the single movies that met the minimum support and calculate the movie pairs that also meet the support minimum and prints these statistics to a new file.

This continues until there are no more movie pairs/sets that meet the minimum support or the maximum number of movies is reach.

The maximum number of movies is simple a integer. For example, if I have it set to three, it will only calculate the support for single movies, movie pairs of 2, and movie sets of 3 and print all the singles, pairs, and sets with their respective support back to each file.

An example of one of my output files looks like this...

    99 195 347,0.21314952279957583
    99 343 347,0.24284199363732767
    99 343 361,0.23329798515376457
    99 347 361,0.23223753976670203
    343 347 361,0.20254506892895016

Which is the sets of three movies, space delimited followed by a "," and then the support value. The single movie and movie pair files would look the exact same way, but only have 1 (or 2) movie ID's before the comma.

Note: I have a mapping for movie ID (number) to movie name for printing later.

My question... from what I have, is there a way for me to go about calculating the confidence of all possible rules and printing/saving the ones that meet the minimum confidence %?

Well, what have you tried?

There are APRIORI pseudocodes all over the internet, and hundreds of implementations, too. The part where most people fail to implement it efficiently is the rules to keep the number of candidates to the minimum - you don't want to try all combinations of size 3 or larger. It takes way too long and is worthless to do all combinations.

Key to Apriori is the generation and pruning of candidates for the next round.

The confidence definition on the other hand is pretty straightforward.

Compute a rule, then compute the confidence by the support of the full item set and the head only. Apparently you already have the support, so computing the confidence should be two lookups to your DB of support values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM