简体   繁体   English

如何提高大数据MYSQL查询的性能?

[英]How to improve the performance of MYSQL query with large data?

I am using MySQL tables that have the following data: 我正在使用具有以下数据的MySQL表:

users(ID, name, email, create_added) (about 10000 rows)
points(user_id, point) (about 15000 rows)

And my query: 而我的查询:

SELECT u.*, SUM(p.point) point 
FROM users u 
LEFT JOIN points p ON p.user_id = u.ID 
WHERE u.id > 0 
GROUP BY u.id 
ORDER BY point DESC 
LIMIT 0, 10

I only get the top 10 users having best point, but then it dies. 我只获得前十名用户的最佳评价,但随后死亡。 How can I improve the performance of my query? 如何提高查询性能?

Like @Grim said, you can use INNER JOIN instead of LEFT JOIN . 就像@Grim所说的那样,可以使用INNER JOIN代替LEFT JOIN However, if you truly look for optimization, I would suggest you to have an extra field at table users with a precalculate point . 但是,如果您真正寻求优化,我建议您为表users提供一个额外的字段,并预先计算point This solution would beat any query optimization with your current database design. 在您当前的数据库设计中,该解决方案将击败任何查询优化。

Swapping the LEFT JOIN for an INNER JOIN would help a lot. LEFT JOIN INNER JOININNER JOIN Make sure points.point and points.user_id are indexed. 确保points.pointpoints.user_id进行索引。 I assume you can get rid of the WHERE clause, as u.id will always be more than 0 (although MySQL probably does this for you at the query optimisation stage). 我假设您可以摆脱WHERE子句,因为u.id始终大于0(尽管MySQL可能在查询优化阶段为您执行此操作)。

It doesn't really matter than you are getting only 10 rows. 这实际上并不重要,因为您仅获得10行。 MySQL has to sum up the points for every user, before it can sort them ("Using filesort" operation.) That LIMIT is applied last. MySQL必须对每个用户的分数进行汇总,然后才能对它们进行排序(“使用文件排序”操作。)最后应用LIMIT。

A covering index ON points(user_id,point) is going to be the best bet for optimum performance. 覆盖索引ON points(user_id,point)将是最佳性能的最佳选择。 (I'm really just guessing, without any EXPLAIN output or table definitions.) (我只是在猜测,没有任何EXPLAIN输出或表定义。)

The id column in users is likely the primary key, or at least a unique index. usersid列可能是主键,或者至少是唯一索引。 So it's likely you already have an index with id as the leading column, or primary key cluster index if it's InnoDB.) 因此,很可能您已经有一个id为开头的索引,如果是InnoDB,则为主键集群索引。)

I'd be tempted to test a query like this: 我很想测试这样的查询:

 SELECT u.*
      , s.total_points
   FROM ( SELECT p.user_id
               , SUM(p.point) AS total_points
            FROM points p
           WHERE p.user_id > 0
           GROUP BY p.user_id
           ORDER BY total_points DESC
           LIMIT 10
        ) s
   JOIN user u
     ON u.id = s.user_id
  ORDER BY s.total_points DESC 

That does have the overhead of creating a derived table, but with a suitable index on points, with a leading column of user_id, and including the point column, it's likely that MySQL can optimize the group by using the index, and avoiding one "Using filesort" operation (for the GROUP BY). 这确实会产生创建派生表的开销,但是要在点上具有适当的索引,并且必须使用user_id的前导列,并包括point列,MySQL可能可以通过使用索引来优化组,并避免使用“ filesort”操作(对于GROUP BY)。

There will likely be a "Using filesort" operation on that resultset, to get the rows ordered by total_points. 该结果集上可能会执行“使用文件排序”操作,以获取按total_points排序的行。 Then get the first 10 rows from that. 然后从中获取前10行。

With those 10 rows, we can join to the user table to get the corresponding rows. 使用这10行,我们可以联接到用户表以获取相应的行。

BUT.. there is one slight difference with this result, if any of the values of user_id that are in the top 10 which aren't in the user table, then this query will return less than 10 rows. 但是,此结果略有不同,如果前十位中的user_id值不在用户表中,则该查询将返回少于10行。 (I'd expect there to be a foreign key defined, so that wouldn't happen, but I'm really just guessing without table definitions.) (我希望有一个外键定义,所以不会发生,但是我只是在猜测没有表定义。)

An EXPLAIN would show the access plan being used by MySQL. 一个EXPLAIN将显示正在使用由MySQL的访问计划。

Ever thought about partitioning? 有想过分区吗? I'm currently working with large database and successfully improve sql query. 我目前正在使用大型数据库并成功改善sql查询。

For example, 例如,

PARTITION BY RANGE (`ID`) (
    PARTITION p1 VALUES LESS THAN (100) ENGINE = InnoDB,
    PARTITION p2 VALUES LESS THAN (200) ENGINE = InnoDB,
    PARTITION p3 VALUES LESS THAN (300) ENGINE = InnoDB,
    ... and so on..
)

It allows us to get better speed while scanning mysql table. 它使我们在扫描mysql表时可以获得更好的速度。 Mysql will scan only partition p 1 that contains userid 1 to 99 even if there are million rows in table. 即使表中有百万行,Mysql也只会扫描包含用户标识1到99的分区p 1。

Check out this http://dev.mysql.com/doc/refman/5.5/en/partitioning.html 看看这个http://dev.mysql.com/doc/refman/5.5/en/partitioning.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM