简体   繁体   中英

Optimal SQL Query when ORDERing BY SUMs

Just as a note, this question is an extension of one of my previous questions. The parameters have changed, so I need a new answer.

I have a MySQL Table that has four fields post_id(unique int), user_id(int), category(varchar), score(int).

My goal is to end up with two values, one being what percent of a user's posts are in "x" category. The second is the sum of all the scores in that "x" category. To do this I've assumed that I need to get three values from MySQL:

  • SUM( score ) GROUP BY category
  • COUNT( post_id ) GROUP BY category
  • COUNT ( post_id )

So that's a simple enough query to write. Here's the difficult part: I need to get the top 50 users, ordered by some calculation like (percent + sum). I guess I could write a query that does all the above math in a subquery/JOIN, and then just put an ORDER BY and LIMIT in the main query, but this seems inefficient. I'm planning for 2million users, and each user could have 5000 posts. If I did my query like that (I think) it would take forever to run through each of those records.

What is the most efficient way to run a query like this? I've read about MySQL views which seem like a nice idea, but I've also read they have huge performance problems. Is it worth it?

Or is it impossible? Should I settle for running a CRON job a couple times a day, and just storing faux-realtime numbers?

Do you already have a large user database and a lot of posts?

If you don't you could create a meta-table that keeps track of these sums and counts. These would be easy to update in real time when the user adds a post or a score. You wouldn't have to scan the DB every time you needed to recount posts and scores for the statistics because you'd already have them in a table. It would be easy to do the calculations on this table instead.

There's a little extra work in the beginning when you create the functions that add everything to the meta-table. But it would probably pay off in the long run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM