[英]Mysql InnoDb is very slow on SELECT query
I have a mysql table with following structure: 我有一个具有以下结构的mysql表:
mysql> show create table logs \G;
Create Table: CREATE TABLE `logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`request` text,
`response` longtext,
`msisdn` varchar(255) DEFAULT NULL,
`username` varchar(255) DEFAULT NULL,
`shortcode` varchar(255) DEFAULT NULL,
`response_code` varchar(255) DEFAULT NULL,
`response_description` text,
`transaction_name` varchar(250) DEFAULT NULL,
`system_owner` varchar(250) DEFAULT NULL,
`request_date_time` datetime DEFAULT NULL,
`response_date_time` datetime DEFAULT NULL,
`comments` text,
`user_type` varchar(255) DEFAULT NULL,
`channel` varchar(20) DEFAULT 'WEB',
/**
other columns here....
other 18 columns here, with Type varchar and Text
**/
PRIMARY KEY (`id`),
KEY `transaction_name` (`transaction_name`) USING BTREE,
KEY `msisdn` (`msisdn`) USING BTREE,
KEY `username` (`username`) USING BTREE,
KEY `request_date_time` (`request_date_time`) USING BTREE,
KEY `system_owner` (`system_owner`) USING BTREE,
KEY `shortcode` (`shortcode`) USING BTREE,
KEY `response_code` (`response_code`) USING BTREE,
KEY `channel` (`channel`) USING BTREE,
KEY `request_date_time_2` (`request_date_time`),
KEY `response_date_time` (`response_date_time`)
) ENGINE=InnoDB AUTO_INCREMENT=59582405 DEFAULT CHARSET=utf8
and it has more than 30000000 records in it. 并且其中有超过30000000条记录。
mysql> select count(*) from logs;
+----------+
| count(*) |
+----------+
| 38962312 |
+----------+
1 row in set (1 min 17.77 sec)
Now the problem is that it is very slow, the result of select takes ages to fetch records from table. 现在的问题是它非常慢,select的结果要花一些时间才能从表中获取记录。
My following sub query takes almost 30 minutes to fetch records of one day: 我的以下子查询需要近30分钟的时间来获取一天的记录:
SELECT
COUNT(sub.id) AS count,
DATE(sub.REQUEST_DATE_TIME) AS transaction_date,
sub.SYSTEM_OWNER,
sub.transaction_name,
sub.response,
MIN(sub.response_time),
MAX(sub.response_time),
AVG(sub.response_time),
sub.channel
FROM
(SELECT
id,
REQUEST_DATE_TIME,
RESPONSE_DATE_TIME,
TIMESTAMPDIFF(SECOND, REQUEST_DATE_TIME, RESPONSE_DATE_TIME) AS response_time,
SYSTEM_OWNER,
transaction_name,
(CASE
WHEN response_code IN ('0' , '00', 'EIL000') THEN 'Success'
ELSE 'Failure'
END) AS response,
channel
FROM
logs
WHERE
response_code != ''
AND DATE(REQUEST_DATE_TIME) BETWEEN '2016-10-26 00:00:00' AND '2016-10-27 00:00:00'
AND SYSTEM_OWNER != '') sub
GROUP BY DATE(sub.REQUEST_DATE_TIME) , sub.channel , sub.SYSTEM_OWNER , sub.transaction_name , sub.response
ORDER BY DATE(sub.REQUEST_DATE_TIME) DESC , sub.SYSTEM_OWNER , sub.transaction_name , sub.response DESC;
I have also added indexes to my table, but still it is very slow. 我还向表中添加了索引,但是仍然很慢。
Any help on how can I make it fast ? 任何有关如何使它快速运行的帮助?
EDIT: Ran the above query using EXPLAIN 编辑:使用EXPLAIN运行以上查询
+----+-------------+------------+------+----------------------------+------+---------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+----------------------------+------+---------+------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 16053297 | Using temporary; Using filesort |
| 2 | DERIVED | logs | ALL | system_owner,response_code | NULL | NULL | NULL | 32106592 | Using where |
+----+-------------+------------+------+----------------------------+------+---------+------+----------+---------------------------------+
As it stands, the query must scan the entire table. 就目前而言,查询必须扫描整个表。
But first, let's air a possible bug: 但首先,让我们发布一个可能的错误:
AND DATE(REQUEST_DATE_TIME) BETWEEN '2016-10-26 00:00:00'
AND '2016-10-27 00:00:00'
Gives you the logs for two days -- all of the 26th and all of the 27th. 给你两天的日志-所有的26日和所有的27日。 Or is that what you really wanted?
还是那是您真正想要的? (
BETWEEN
is inclusive .) (
BETWEEN
是包括端值 。)
But the performance problem is that the index will not be used because request_date_time
is hiding inside a function ( DATE
). 但是性能问题在于,由于
request_date_time
隐藏在函数( DATE
)中,因此将不使用索引。
Jump forward to a better way to phrase it: 跳到一种更好的短语表达方式:
AND REQUEST_DATE_TIME >= '2016-10-26'
AND REQUEST_DATE_TIME < '2016-10-26' + INTERVAL 1 DAY
DATETIME
can be compared against a date. DATETIME
与日期进行比较。 1
to however many days you wish -- without having to deal with leap days, etc. 1
更改为任意天数-无需处理leap日等。 request_date_time
, thereby cutting back severely on amount of data to be scanned. request_date_time
上的索引,从而大大减少了要扫描的数据量。 As for other tempting areas: 至于其他诱人的领域:
!=
does not optimize well, so no 'composite' index is likely to be beneficial. !=
不能很好地优化,因此没有“复合”索引可能是有益的。 WHERE
, no index is useful for GROUP BY
or ORDER BY
. WHERE
,因此没有索引对于GROUP BY
或ORDER BY
是有用的。 DATE()
in WHERE
do not apply to GROUP BY
; WHERE
对DATE()
评论不适用于GROUP BY
; no change needed. Why have the subquery? 为什么有子查询? I think it can be done in a single layer.
我认为可以在单个层中完成。 This will eliminate a rather large temp table.
这将消除一个相当大的临时表。 (Yeah, it means 3 uses of
TIMESTAMPDIFF()
, but that is probably a lot cheaper than the temp table.) (是的,这意味着
TIMESTAMPDIFF()
3种用法,但这可能比temp表便宜很多。)
How much RAM? 多少内存? What is the value of
innodb_buffer_pool_size
? innodb_buffer_pool_size
的值是innodb_buffer_pool_size
?
If my comments are not enough, and if you frequently run a query like this (over a day or over a date range), then we can talk about building and maintaining a Summary table , which might give you a 10x speedup. 如果我的评论还不够,并且您经常在一天或一个日期范围内运行这样的查询,那么我们可以讨论构建和维护Summary表 ,这可能使您的速度提高10倍。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.