简体   繁体   English

按SUM排序时的最佳SQL查询

[英]Optimal SQL Query when ORDERing BY SUMs

Just as a note, this question is an extension of one of my previous questions. 只是注意,这个问题是我先前问题之一的扩展 The parameters have changed, so I need a new answer. 参数已更改,因此我需要一个新答案。

I have a MySQL Table that has four fields post_id(unique int), user_id(int), category(varchar), score(int). 我有一个MySQL表,该表具有四个字段post_id(unique int),user_id(int),category(varchar),score(int)。

My goal is to end up with two values, one being what percent of a user's posts are in "x" category. 我的目标是最终得到两个值,一个是“ x”类别中用户帖子的百分比。 The second is the sum of all the scores in that "x" category. 第二个是该“ x”类别中所有分数的总和。 To do this I've assumed that I need to get three values from MySQL: 为此,我假设我需要从MySQL获取三个值:

  • SUM( score ) GROUP BY category SUM( score )GROUP BY category
  • COUNT( post_id ) GROUP BY category COUNT( post_id )GROUP BY category
  • COUNT ( post_id ) COUNT( post_id

So that's a simple enough query to write. 因此,这足以编写一个简单的查询。 Here's the difficult part: I need to get the top 50 users, ordered by some calculation like (percent + sum). 这是困难的部分:我需要按(百分比+总和)之类的计算顺序来排名前50位用户。 I guess I could write a query that does all the above math in a subquery/JOIN, and then just put an ORDER BY and LIMIT in the main query, but this seems inefficient. 我想我可以编写一个在子查询/ JOIN中执行上述所有数学运算的查询,然后将ORDER BY和LIMIT放入主查询中,但这似乎效率很低。 I'm planning for 2million users, and each user could have 5000 posts. 我正在计划200万用户,每个用户可以有5000个帖子。 If I did my query like that (I think) it would take forever to run through each of those records. 如果我这样进行查询(我认为),那么遍历所有这些记录将需要永远的时间。

What is the most efficient way to run a query like this? 像这样运行查询的最有效方法是什么? I've read about MySQL views which seem like a nice idea, but I've also read they have huge performance problems. 我已经阅读了有关MySQL视图的想法,这似乎是一个不错的主意,但我也阅读了它们存在巨大的性能问题。 Is it worth it? 这值得么?

Or is it impossible? 还是不可能? Should I settle for running a CRON job a couple times a day, and just storing faux-realtime numbers? 我是否应该满足每天几次执行CRON作业并仅存储虚假实时数字的条件?

Do you already have a large user database and a lot of posts? 您已经有一个庞大的用户数据库和很多帖子吗?

If you don't you could create a meta-table that keeps track of these sums and counts. 如果您不这样做,则可以创建一个跟踪这些总和和计数的元表。 These would be easy to update in real time when the user adds a post or a score. 当用户添加帖子或分数时,这些实时更新将很容易。 You wouldn't have to scan the DB every time you needed to recount posts and scores for the statistics because you'd already have them in a table. 您不必每次都需要重新统计统计信息的职位和分数时,就不必扫描数据库,因为您已经将它们存储在表格中了。 It would be easy to do the calculations on this table instead. 相反,在此表上进行计算将很容易。

There's a little extra work in the beginning when you create the functions that add everything to the meta-table. 在创建将所有内容添加到元表的功能时,开始时需要做一些额外的工作。 But it would probably pay off in the long run. 但是从长远来看,它可能会有所回报。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM