简体   繁体   English

MySQL查询在大表上运行非常慢

[英]MySQL query runs very slow on large table

I am trying to run the following query on a very large table with over 90 million of rows increasing 我正在尝试在具有超过9000万行的超大型表上运行以下查询

SELECT COUNT(DISTINCT device_uid) AS cnt,  DATE_FORMAT(time_start, '%Y-%m-%d') AS period 
FROM game_session 
WHERE account_id = -2 AND DATE_FORMAT(time_start '%Y-%m-%d') BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()
GROUP BY period 
ORDER BY period DESC

I have the following table structure: 我具有以下表结构:

CREATE TABLE `game_session` (
  `session_id` bigint(20) NOT NULL,
  `account_id` bigint(20) NOT NULL,
  `authentification_type` char(2) NOT NULL,
  `source_ip` char(40) NOT NULL,
  `device` char(50) DEFAULT NULL COMMENT 'Added 0.9',
  `device_uid` char(50) NOT NULL,
  `os` char(50) DEFAULT NULL COMMENT 'Added 0.9',
  `carrier` char(50) DEFAULT NULL COMMENT 'Added 0.9',
  `protocol_version` char(20) DEFAULT NULL COMMENT 'Added 0.9',
  `lang_key` char(2) NOT NULL DEFAULT 'en',
  `instance_id` char(100) NOT NULL,
  `time_start` datetime NOT NULL,
  `time_end` datetime DEFAULT NULL,
  PRIMARY KEY (`session_id`),
  KEY `game_account_session_fk` (`account_id`),
  KEY `lang_key_fk` (`lang_key`),
  KEY `lookup_active_session_idx` (`account_id`,`time_start`),
  KEY `lookup_finished_session_idx` (`account_id`,`time_end`),
  KEY `start_time_idx` (`time_start`),
  KEY `lookup_guest_session_idx` (`device_uid`,`time_start`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

How can I optimize this? 我该如何优化呢?

Thank for your answer 谢谢你的回答

DATE_FORMAT(time_start '%Y-%m-%d') sounds expensive. DATE_FORMAT(time_start '%Y-%m-%d')听起来很昂贵。
Every calculation on a column reduces the use of indexes. 列上的每个计算都会减少索引的使用。 You probably run in to a full index scan + calculation of DATE_FORMAT for each value instead of a index lookup / range scan. 您可能需要对每个值进行完整的索引扫描+ DATE_FORMAT计算,而不是索引查找/范围扫描。

Try to store the computed value in the column (or create a computed index if mysql supports it). 尝试将计算值存储在列中(如果mysql支持,则创建一个计算索引)。 Or even better rewrite your conditions to compare directly to the value stored in the column. 甚至更好地重写条件以直接与存储在列中的值进行比较。

好吧,90mlns太多了,但是我怀疑它不会使用start_time_idx因为可以避免这种操作(您可以操作与之比较的值,如果mysql是,则每个查询也只能执行一次)足够聪明),您是否检查了EXPLAIN

You may want to group and sort by time_start instead of the period value you create when the query is run. 您可能需要对time_start进行分组和排序,而不是根据运行查询时创建的period值进行分组和排序。 Sorting by period requires all of those values to be generated before any sorting can be done. period排序需要先生成所有这些值,然后才能进行任何排序。

Try swapping out your WHERE clause with the following: WHERE account_id = -2 AND time_start BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE() 尝试将WHERE子句换成以下内容: WHERE account_id = -2 AND time_start BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()

MySQL will still catch the dates between, the only ones you'll need to worry about are the ones from today, which might get truncated due to technically being greater than midnight. MySQL仍然会捕捉到它们之间的日期,您唯一需要担心的是今天的日期,由于技术上大于午夜,所以这些日期可能会被截断。

You can fix that by incrementing the second CURDATE( ) with CURDATE( ) + INTERVAL 1 DAY 您可以通过用CURDATE( ) + INTERVAL 1 DAY增加第二个CURDATE( )来解决此问题

I'd change 我会改变

BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()

to

> (CURDATE() - INTERVAL 90 DAY)

You don't have records from future, do you? 您没有未来的记录,对吗?

Change the query to: 将查询更改为:

SELECT COUNT(DISTINCT device_uid) AS cnt
     , DATE_FORMAT(time_start, '%Y-%m-%d') AS period 
FROM game_session 
WHERE account_id = -2 
  AND time_start >= CURDATE() - INTERVAL 90 DAY 
  AND time_start <  CURDATE() + INTERVAL 1 DAY
GROUP BY DATE(time_start) DESC

so the index of (account_id, time_start) can be used for the WHERE part of the query. 因此(account_id, time_start)的索引可用于查询的WHERE部分。


If it's still slow - the DATE(time_start) does not look very good for performance - add a date_start column and store the date part of time_start . 如果仍然很慢date_start DATE(time_start)看起来对性能不太好-添加date_start列并存储time_start的日期部分。

Then add an index on (account_id, date_start, device_uid) which will further improve performance as all necessary info - for the GROUP BY date_start and the COUNT(DISTINCT device_uid) parts - will be on the index: 然后在(account_id, date_start, device_uid)上添加索引(account_id, date_start, device_uid)这将进一步提高性能,因为所有必需的信息-对于GROUP BY date_startCOUNT(DISTINCT device_uid)部分-都将在索引上:

SELECT COUNT(DISTINCT device_uid) AS cnt
     , date_start                 AS period 
FROM game_session 
WHERE account_id = -2 
  AND date_start BETWEEN CURDATE() - INTERVAL 90 DAY 
                     AND CURDATE()
GROUP BY date_start DESC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM