[英]How to improve the performance of MYSQL query with large data?
I am using MySQL
tables that have the following data: 我正在使用具有以下数据的
MySQL
表:
users(ID, name, email, create_added) (about 10000 rows)
points(user_id, point) (about 15000 rows)
And my query: 而我的查询:
SELECT u.*, SUM(p.point) point
FROM users u
LEFT JOIN points p ON p.user_id = u.ID
WHERE u.id > 0
GROUP BY u.id
ORDER BY point DESC
LIMIT 0, 10
I only get the top 10 users having best point, but then it dies. 我只获得前十名用户的最佳评价,但随后死亡。 How can I improve the performance of my query?
如何提高查询性能?
Like @Grim said, you can use INNER JOIN
instead of LEFT JOIN
. 就像@Grim所说的那样,可以使用
INNER JOIN
代替LEFT JOIN
。 However, if you truly look for optimization, I would suggest you to have an extra field at table users
with a precalculate point
. 但是,如果您真正寻求优化,我建议您为表
users
提供一个额外的字段,并预先计算point
。 This solution would beat any query optimization with your current database design. 在您当前的数据库设计中,该解决方案将击败任何查询优化。
Swapping the LEFT JOIN
for an INNER JOIN
would help a lot. 将
LEFT JOIN
INNER JOIN
会INNER JOIN
。 Make sure points.point
and points.user_id
are indexed. 确保
points.point
和points.user_id
进行索引。 I assume you can get rid of the WHERE
clause, as u.id
will always be more than 0 (although MySQL probably does this for you at the query optimisation stage). 我假设您可以摆脱
WHERE
子句,因为u.id
始终大于0(尽管MySQL可能在查询优化阶段为您执行此操作)。
It doesn't really matter than you are getting only 10 rows. 这实际上并不重要,因为您仅获得10行。 MySQL has to sum up the points for every user, before it can sort them ("Using filesort" operation.) That LIMIT is applied last.
MySQL必须对每个用户的分数进行汇总,然后才能对它们进行排序(“使用文件排序”操作。)最后应用LIMIT。
A covering index ON points(user_id,point)
is going to be the best bet for optimum performance. 覆盖索引
ON points(user_id,point)
将是最佳性能的最佳选择。 (I'm really just guessing, without any EXPLAIN
output or table definitions.) (我只是在猜测,没有任何
EXPLAIN
输出或表定义。)
The id
column in users
is likely the primary key, or at least a unique index. users
的id
列可能是主键,或者至少是唯一索引。 So it's likely you already have an index with id
as the leading column, or primary key cluster index if it's InnoDB.) 因此,很可能您已经有一个
id
为开头的索引,如果是InnoDB,则为主键集群索引。)
I'd be tempted to test a query like this: 我很想测试这样的查询:
SELECT u.*
, s.total_points
FROM ( SELECT p.user_id
, SUM(p.point) AS total_points
FROM points p
WHERE p.user_id > 0
GROUP BY p.user_id
ORDER BY total_points DESC
LIMIT 10
) s
JOIN user u
ON u.id = s.user_id
ORDER BY s.total_points DESC
That does have the overhead of creating a derived table, but with a suitable index on points, with a leading column of user_id, and including the point column, it's likely that MySQL can optimize the group by using the index, and avoiding one "Using filesort" operation (for the GROUP BY). 这确实会产生创建派生表的开销,但是要在点上具有适当的索引,并且必须使用user_id的前导列,并包括point列,MySQL可能可以通过使用索引来优化组,并避免使用“ filesort”操作(对于GROUP BY)。
There will likely be a "Using filesort" operation on that resultset, to get the rows ordered by total_points. 该结果集上可能会执行“使用文件排序”操作,以获取按total_points排序的行。 Then get the first 10 rows from that.
然后从中获取前10行。
With those 10 rows, we can join to the user table to get the corresponding rows. 使用这10行,我们可以联接到用户表以获取相应的行。
BUT.. there is one slight difference with this result, if any of the values of user_id
that are in the top 10 which aren't in the user table, then this query will return less than 10 rows. 但是,此结果略有不同,如果前十位中的
user_id
值不在用户表中,则该查询将返回少于10行。 (I'd expect there to be a foreign key defined, so that wouldn't happen, but I'm really just guessing without table definitions.) (我希望有一个外键定义,所以不会发生,但是我只是在猜测没有表定义。)
An EXPLAIN
would show the access plan being used by MySQL. 一个
EXPLAIN
将显示正在使用由MySQL的访问计划。
Ever thought about partitioning? 有想过分区吗? I'm currently working with large database and successfully improve sql query.
我目前正在使用大型数据库并成功改善sql查询。
For example, 例如,
PARTITION BY RANGE (`ID`) (
PARTITION p1 VALUES LESS THAN (100) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (200) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (300) ENGINE = InnoDB,
... and so on..
)
It allows us to get better speed while scanning mysql table. 它使我们在扫描mysql表时可以获得更好的速度。 Mysql will scan only partition p 1 that contains userid 1 to 99 even if there are million rows in table.
即使表中有百万行,Mysql也只会扫描包含用户标识1到99的分区p 1。
Check out this http://dev.mysql.com/doc/refman/5.5/en/partitioning.html 看看这个http://dev.mysql.com/doc/refman/5.5/en/partitioning.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.