简体   繁体   中英

how to rate or rank votes

我真的很抱歉,如果我错了我的问题,但我想要一些想法...我想要和排名算法的想法,包括他们提交投票的时间。

Nice Question!

Okay lets bring it on!

First of all one thing you cannot when calculating good ratings is Bayesianaverage

You ran read up on it but very simplified it takes care of the following:

  • Entries with little votes are not the true mean of their votes but have a componentn of the mean rating throughout your dataset. For example on IMDB the default rating is somewhere at 6.4. So a film with only 2 votes which were 10 stars each may still have something between 6 and 7. The more votes the more meaning they become alltogether and the rating is "pulled away" from the default. Imdb also implements a minimum number of votes for their movies to show up in listings.

Another thing that I find confusing is: Why is the time of the vote important? Didn't you mean the time of the entry that was voted on? So in our movies example just released movies are more important?

But anyway! In both cases good results are often achieved by applying logarithmic functions.

For our movie example movies relevance could be multiplied by

1 + 1/SQRT(1 + CURRENT_YEAR - RELEASE_YEAR )

So 1 is a socket rating that every movie gets. A movie from teh current year will have a boost of 100% (200% relevance) as the above will return true. Last year 170%, 2 Years old 157% and son on.

But the difference of a movie from 1954 or 1963 is far not so great.

So remember:

  • Everything you use in your calculations. Is it really linear? May it distort your ratings? Are the relations throughout the dataset sane?

If you want to have recent votes cast more you can do that the exact same way but weight your votes. It makes sense too if you want recent voted stuff be "warmed up"... Because it is currently hot and discussed in your community for example.

That beeing said it remains just hard work. A lot of playing around etc.

Let me give you one last idea.

At the company I work we calculate a relevance for movies.

We have a config array where we store the "weighting" of several factors in the final relevance.

It looks like this:

        $weights = array(
                "year" => 2, // release year
                "rating" =>13, // rating 0-100
                "cover" => 4,  // cover available?
                "shortdescription" => 4, // short descr available?
                "trailer" => 3, // trailer available?
                "imdbpr" => 13, // google pagerank of imdb site
        );

Then we calculate a value between 0 and 1 for every metric. There are different methods. But let me show you the example of our rating (which is itself an aggregated rating of several platforms that we crawl and that have different weightings themsevles etc.)

        $yearDiff = $data["year"] - date('Y');
        //year
        if (!$data["year"]){
                $values['year'] = 0;
        } else if($yearDiff==0) {   
                $values['year'] = 1;
        } else if($yearDiff < 3) {
                $values['year'] = 0.8;
        } else if($yearDiff < 10) {   
                $values['year'] = 0.6;
        } else {
                $values['year'] = 1/sqrt(abs($yearDiff));
        }

So you see we hardcoded some "age intervals" and relyed on the sqrt function only for older movies. In fact the difference there is minimal so the SQRT example here is very poor. But mathematical functions are very often useful!

You can, for example, also use periodic functions like sinus curves etc to calculate seasonal relevance! For example your year has a range from 0-1 then you can use sinus function to weight up summer hits / winter hits / autumn hits for the current time of the year!

One last example for the IMDB pagerank. It is completely hardcoded as there are only 10 different values possible and they are not distributed in an statistical homogenous way (pagerank 1 or 2 is even worse than none):

        if($imdbpr >= 7) { 
                $values['imdbpr'] = 1;
        } else if($imdbpr >= 6) {
                $values['imdbpr'] = 0.9;
        } else if($imdbpr >= 5) {
                $values['imdbpr'] = 0.8;
        } else if($imdbpr >= 4) {
                $values['imdbpr'] = 0.6;
        } else if($imdbpr >= 3) {
                $values['imdbpr'] = 0.5;
        } else if($imdbpr >= 2) {
                $values['imdbpr'] = 0.3;
        } else if($imdbpr >= 1) {
                $values['imdbpr'] = 0.1;
        } else if($imdbpr >= 0) {
                $values['imdbpr'] = 0.0;
        } else {
                $values['imdbpr'] = 0.4; // no pagerank available. probably new
        }

Then we sum it up like this:

        foreach($values as $field=>$value) {
                $malus += ($value*$weights[$field]) / array_sum($weights);
        }

This may not be an exact answer to your question but a bit more and broadly, but I hope I pointed you in the right direction and gave you some points where your thoughts can pick up!

Have fun and success with your application!

Reddit's code is open source. There is a pretty good discussion of their ranking algorithm here, with code: http://amix.dk/blog/post/19588

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM