How to improve the performance of MYSQL query with large data?

Question

I am using MySQL tables that have the following data:

users(ID, name, email, create_added) (about 10000 rows)
points(user_id, point) (about 15000 rows)

And my query:

SELECT u.*, SUM(p.point) point 
FROM users u 
LEFT JOIN points p ON p.user_id = u.ID 
WHERE u.id > 0 
GROUP BY u.id 
ORDER BY point DESC 
LIMIT 0, 10

I only get the top 10 users having best point, but then it dies. How can I improve the performance of my query?

Answer 1

Like @Grim said, you can use INNER JOIN instead of LEFT JOIN . However, if you truly look for optimization, I would suggest you to have an extra field at table users with a precalculate point . This solution would beat any query optimization with your current database design.

Answer 2

Swapping the LEFT JOIN for an INNER JOIN would help a lot. Make sure points.point and points.user_id are indexed. I assume you can get rid of the WHERE clause, as u.id will always be more than 0 (although MySQL probably does this for you at the query optimisation stage).

Answer 3

It doesn't really matter than you are getting only 10 rows. MySQL has to sum up the points for every user, before it can sort them ("Using filesort" operation.) That LIMIT is applied last.

A covering index ON points(user_id,point) is going to be the best bet for optimum performance. (I'm really just guessing, without any EXPLAIN output or table definitions.)

The id column in users is likely the primary key, or at least a unique index. So it's likely you already have an index with id as the leading column, or primary key cluster index if it's InnoDB.)

I'd be tempted to test a query like this:

 SELECT u.*
      , s.total_points
   FROM ( SELECT p.user_id
               , SUM(p.point) AS total_points
            FROM points p
           WHERE p.user_id > 0
           GROUP BY p.user_id
           ORDER BY total_points DESC
           LIMIT 10
        ) s
   JOIN user u
     ON u.id = s.user_id
  ORDER BY s.total_points DESC

That does have the overhead of creating a derived table, but with a suitable index on points, with a leading column of user_id, and including the point column, it's likely that MySQL can optimize the group by using the index, and avoiding one "Using filesort" operation (for the GROUP BY).

There will likely be a "Using filesort" operation on that resultset, to get the rows ordered by total_points. Then get the first 10 rows from that.

With those 10 rows, we can join to the user table to get the corresponding rows.

BUT.. there is one slight difference with this result, if any of the values of user_id that are in the top 10 which aren't in the user table, then this query will return less than 10 rows. (I'd expect there to be a foreign key defined, so that wouldn't happen, but I'm really just guessing without table definitions.)

An EXPLAIN would show the access plan being used by MySQL.

Answer 4

Ever thought about partitioning? I'm currently working with large database and successfully improve sql query.

For example,

PARTITION BY RANGE (`ID`) (
    PARTITION p1 VALUES LESS THAN (100) ENGINE = InnoDB,
    PARTITION p2 VALUES LESS THAN (200) ENGINE = InnoDB,
    PARTITION p3 VALUES LESS THAN (300) ENGINE = InnoDB,
    ... and so on..
)

It allows us to get better speed while scanning mysql table. Mysql will scan only partition p 1 that contains userid 1 to 99 even if there are million rows in table.

Check out this http://dev.mysql.com/doc/refman/5.5/en/partitioning.html

How to improve the performance of MYSQL query with large data?

Question

4 answers

solution1
2 2013-08-23 03:00:28

solution2
1 2013-08-23 02:57:12

solution3
1 2013-08-23 03:33:22

solution4
0 2013-10-08 07:14:21

How to improve the performance of MYSQL query with large data?

Question

4 answers

solution1 2 2013-08-23 03:00:28

solution2 1 2013-08-23 02:57:12

solution3 1 2013-08-23 03:33:22

solution4 0 2013-10-08 07:14:21

solution1
2 2013-08-23 03:00:28

solution2
1 2013-08-23 02:57:12

solution3
1 2013-08-23 03:33:22

solution4
0 2013-10-08 07:14:21