按两列排序 - 使用最高评级平均值和最高评级

Question

I would like to show ratings with the highest average (rating_avg) AND number of ratings(rating_count). 我想显示最高平均评分（rating_avg）和评分数（rating_count）。 With my current script, it shows the highest average rating (DESC) regardless of how many ratings there are, which is useless for my visitors. 使用我当前的脚本，它显示最高的平均评级（DESC），无论有多少评级，这对我的访问者来说是无用的。

For example it shows: 例如，它显示：

Item 1 - 5.0 (1 Ratings) 项目1 - 5.0（1个评分）
Item 2 - 5.0 (2 Ratings) 第2 - 5.0（2个评级）

When it should be showing the Top 10 Highest rated items by rating avg and amount of ratings, such as: 当它应该通过评级平均值和评级数量来显示评分最高的10个项目，例如：

Item 1 - 4.5 (356 Ratings) 项目1 - 4.5（356评级）
Item 2 - 4.3 (200 Ratings) 第2 - 4.3（200个评分）
Item 3 - 4.0 (400 Ratings) 项目3 - 4.0（400评级）

This is what I have right now: 这就是我现在所拥有的：

$result = mysql_query("SELECT id, filename, filenamedisplay, console_dir, downloads, rating_avg, rating_count FROM files WHERE console_dir = '".$nodash."' ORDER BY rating_avg DESC LIMIT 10");

Thanks and I appreciate any help in advance! 谢谢，我提前感谢任何帮助！

Answer 1

This is a subtle problem and an issue in statistics. 这是一个微妙的问题，也是统计学中的一个问题。 What I do is often to downgrade the ratings by one standard error for the proportion. 我所做的通常是将评级降级一个标准误差。 These aren't exactly proportions, but I think the same idea can be applied. 这些并不完全是比例，但我认为可以应用相同的想法。

You can calculate this using the "square root of p*q divided by n" method. 您可以使用“p * q除以n的平方根”方法计算出来。 If you don't understand this, google "standard error of a proportion" (or I might suggest the third chapter in "Data Analysis Using SQL and Excel" which explains this in more detail): 如果你不明白这一点，谷歌“一定比例的标准错误”（或者我可能会建议“使用SQL和Excel进行数据分析”中的第三章更详细地解释这一点）：

SELECT id, filename, filenamedisplay, console_dir, downloads, rating_avg, rating_count
FROM files cross join
     (select count(*) as cnt from files where console_dir = '".$nodash."') as const
WHERE console_dir = '".$nodash."'
ORDER BY rating_avg/5 - sqrt((rating_avg/5) * (1 - rating_avg/5) / const.cnt) DESC
LIMIT 10;

In any case, see if the formula works for you. 在任何情况下，看看该公式是否适合您。

EDIT: 编辑：

Okay, let's change this to the standard error of the mean. 好的，让我们将其改为平均值的标准误差。 I should have done this the first time through, but I was thinking the rating_avg was a proportion. 我应该第一次这样做，但我认为rating_avg是一个比例。 The formula is the standard deviation divided by the square root of the sample size. 公式是标准偏差除以样本大小的平方根。 We can get the population standard deviation in the const subquery: 我们可以得到const子查询中的总体标准差：

     (select count(*) as cnt, stdev(rating_avg) as std from files where console_dir = '".$nodash."') as const

This results in: 这导致：

order by rating_avg - std / sqrt(const.cnt)

This might work, but I would rather have the standard deviation within each group rather than the overall population standard deviation. 这可能有效，但我宁愿在每组中有标准偏差，而不是整体人口标准偏差。 But, it derates the rating by an amount proportional to the size of the sample, which should improve your results. 但是，它会将评级降低一个与样本大小成比例的量，从而改善您的结果。

By the way, the idea of removing one standard deviation is rather arbitrary. 顺便说一下，删除一个标准偏差的想法是相当随意的。 I've just found that it produces reasonable results. 我刚刚发现它产生了合理的结果。 You might prefer to take, say, 1.96 times the standard deviation to get a 95% lower bound on the confidence interval. 您可能更愿意采用标准差的1.96倍来获得置信区间的95％下限。

按两列排序 - 使用最高评级平均值和最高评级

问题描述

1 个解决方案

解决方案1
3 2014-03-18 03:08:49

按两列排序 - 使用最高评级平均值和最高评级

问题描述

1 个解决方案

解决方案1 3 2014-03-18 03:08:49

解决方案1
3 2014-03-18 03:08:49