简体   繁体   English

优化Mysql查询左连接

[英]Optimization Mysql Query Left Join

We want to map the entries of the calibration_data to the calibration data by following query. 我们希望通过以下查询将calibration_data的条目映射到校准数据。 But the duration of this query is quite too long in my opinion (>24h). 但是在我看来这个查询的持续时间太长了(> 24h)。

Is there any optimization possible? 有可能进行任何优化吗? We added for testing more Indexes as needed right now but it didn't had any impact on the duration. 我们现在添加了根据需要测试更多索引,但它对持续时间没有任何影响。

[Edit] [编辑]

The hardware shouldn't be the biggest bottleneck 硬件不应该是最大的瓶颈

  • 128 GB RAM 128 GB RAM
  • 1TB SSD RAID 5 1TB SSD RAID 5
  • 32 cores 32核心

EXPLAIN result EXPLAIN结果

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
|  1 | SIMPLE      | cal   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2009 |   100.00 | Using temporary; Using filesort                |
|  1 | SIMPLE      | m     | NULL       | ALL  | visit         | NULL | NULL    | NULL | 3082466 |   100.00 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+

Query which takes too long: 查询耗时太长:

Insert into knn_data (SELECT cal.X           AS X, 
        cal.Y           AS Y, 
        cal.BeginTime   AS BeginTime, 
        cal.EndTime     AS EndTime, 
        avg(m.dbm_ant)  AS avg_dbm_ant, 
        m.ant_id        AS ant_id, 
        avg(m.location) avg_location, 
        count(*)        AS count, 
        m.visit 
 FROM   calibration cal 
        LEFT join calibration_data m
          ON m.visit BETWEEN cal.BeginTime AND cal.EndTime 
 GROUP  BY cal.X, 
           cal.Y, 
           cal.BeginTime, 
           cal. BeaconId, 
           m.ant_id,
           m.macHash,
           m.visit; 

Table knn_data: 表knn_data:

    CREATE TABLE `knn_data` (
  `X` int(11) NOT NULL,
  `Y` int(11) NOT NULL,
  `BeginTime` datetime NOT NULL,
  `EndTIme` datetime NOT NULL,
  `avg_dbm_ant` float DEFAULT NULL,
  `ant_id` int(11) NOT NULL,
  `avg_location` float DEFAULT NULL,
  `count` int(11) DEFAULT NULL,
  `visit` datetime NOT NULL,
  PRIMARY KEY (`ant_id`,`visit`,`X`,`Y`,`BeginTime`,`EndTIme`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Table calibration 表校准

BeaconId, X, Y, BeginTime, EndTime
41791, 1698, 3944, 2016-11-12 22:44:00, 2016-11-12 22:49:00


CREATE TABLE `calibration` (
  `BeaconId` int(11) DEFAULT NULL,
  `X` int(11) DEFAULT NULL,
  `Y` int(11) DEFAULT NULL,
  `BeginTime` datetime DEFAULT NULL,
  `EndTime` datetime DEFAULT NULL,
  KEY `x,y` (`X`,`Y`),
  KEY `x` (`X`),
  KEY `y` (`Y`),
  KEY `BID` (`BeaconId`),
  KEY `beginTime` (`BeginTime`),
  KEY `x,y,beg,bid` (`X`,`Y`,`BeginTime`,`BeaconId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Table calibration_data 表校准_数据

macHash, visit, dbm_ant, ant_id, mac, isRand, posX, posY, sources, ip, dayOfMonth, location, am, ar
'f5:dc:7d:73:2d:e9', '2016-11-12 22:44:00', '-87', '381', 'f5:dc:7d:73:2d:e9', NULL, NULL, NULL, NULL, NULL, '12', '18.077636300207715', 'inradius_41791', NULL


CREATE TABLE `calibration_data` (
  `macHash` varchar(100) COLLATE utf8_bin NOT NULL,
  `visit` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `dbm_ant` int(3) NOT NULL,
  `ant_id` int(11) NOT NULL,
  `mac` char(17) COLLATE utf8_bin DEFAULT NULL,
  `isRand` tinyint(4) DEFAULT NULL,
  `posX` double DEFAULT NULL,
  `posY` double DEFAULT NULL,
  `sources` int(2) DEFAULT NULL,
  `ip` int(10) unsigned DEFAULT NULL,
  `dayOfMonth` int(11) DEFAULT NULL,
  `location` varchar(80) COLLATE utf8_bin DEFAULT NULL,
  `am` varchar(300) COLLATE utf8_bin DEFAULT NULL,
  `ar` varchar(300) COLLATE utf8_bin DEFAULT NULL,
  KEY `visit` (`visit`),
  KEY `macHash` (`macHash`),
  KEY `ant, time` (`dbm_ant`,`visit`),
  KEY `beacon` (`am`),
  KEY `ant_id` (`ant_id`),
  KEY `ant,mH,visit` (`ant_id`,`macHash`,`visit`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Onetime task? 一次性任务? Then it does not matter? 那没关系? After getting this data loaded, will you incrementally update the "summary table" each day? 获取此数据后,您是否会每天逐步更新“汇总表”?

Shrink datatypes -- bulky data takes longer to process. 收缩数据类型 - 庞大的数据需要更长的时间来处理。 Example: a 4-byte INT DayOfMonth could be a 1-byte TINYINT UNSIGNED . 示例:4字节的INT DayOfMonth可以是1字节的TINYINT UNSIGNED

You are moving a TIMESTAMP into a DATETIME . 您正在将TIMESTAMP移动到DATETIME This may or may not work as you expect. 这可能会或可能不会如您所愿。

INT UNSIGNED is OK for IPv4, but you can't fit IPv6 in it. 对于IPv4, INT UNSIGNED可以正常使用,但您无法在其中使用IPv6。

COUNT(*) probably does not need a 4-byte INT ; COUNT(*)可能不需要4字节的INT ; see the smaller variants. 看到较小的变种。

Use UNSIGNED where appropriate. 在适当的地方使用UNSIGNED

A mac-address takes 19 bytes the way you have it; mac-address以你拥有的方式占用19个字节; it could easily be converted to/from a 6-byte BINARY(6) . 它可以很容易地转换为6字节的BINARY(6) See REPLACE() , UNHEX() , HEX() , etc. 参见REPLACE()UNHEX()HEX()等。

What is the setting of innodb_buffer_pool_size ? innodb_buffer_pool_size的设置是innodb_buffer_pool_size It could be about 100G for the big RAM you have. 你拥有的大RAM可能大约100G。

Do the time ranges overlap? 时间范围重叠吗? If not, take advantage of that. 如果没有,请利用这一点。 Also, don't include unnecessary columns in the PRIMARY KEY , such as EndTime . 另外,不要在PRIMARY KEY包含不必要的列,例如EndTime

Have the GROUP BY columns in the same order as the PRIMARY KEY of knn_data; 使GROUP BY列的顺序与knn_data的PRIMARY KEY相同; this will avoid a lot of block splits during the INSERT . 这样可以避免在INSERT期间发生大量的块分裂。

The big problem is that there is no useful index in calibration_data , so the JOIN has to do a full table scan again and again! 最大的问题是在calibration_data没有有用的索引,因此JOIN必须一次又一次地进行全表扫描! An extimated 2K scans of 3M rows! 3M排的2K扫描结果! Let me focus on that problem... 让我关注这个问题......

There is no good way to do WHERE x BETWEEN start AND end because MySQL does not know whether the datetime ranges overlap. 没有好办法做WHERE x BETWEEN start AND end因为MySQL不知道日期时间范围是否重叠。 There is no real cure for that in this context, so let me approach it differently... 在这种情况下,没有真正的解决办法,所以让我以不同的方式处理它......

Are start and end 'regular'? 开始和结束是“常规”吗? Like every hour? 喜欢每个小时? Of so, we can do some sort of computation instead of the BETWEEN . 因此,我们可以做一些计算而不是 BETWEEN Let me know if this is the case; 如果是这种情况,请告诉我。 I will continue my thoughts. 我会继续思考。

That's a nasty and classical one on "range" queries: the optimiser doesnt use your indexes and end up in a full table scan. 对于“范围”查询来说,这是一个令人讨厌且经典的问题:优化器不会使用您的索引并最终进行全表扫描。 In your explain plan ou can see this on column type=ALL . 在您的解释计划中,您可以在列type=ALL上看到此信息。

Ideally you should have type=range and something in the key column 理想情况下,您应该在键列中具有type=range和某些内容

Some ideas: 一些想法:


I doubt that changing you jointure from 我怀疑你改变你的联合

ON m.visit BETWEEN cal.BeginTime AND cal.EndTime 

to

ON m.visit >= cal.BeginTime AND m.visit <= cal.EndTime

will work, but still give it a try. 会工作,但仍然试一试。


Do trigger an ANALYSE TABLE on both tables. 在两个表上触发ANALYSE TABLE This is will update the stats on your tables and might help the optimiser to take the right decision (ie using the indexes) 这将更新表上的统计信息,可能有助于优化器做出正确的决策(即使用索引)


Change the query to this might also help to force the optimiser use indexes : 将查询更改为此可能也有助于强制优化器使用索引:

Insert into knn_data (SELECT cal.X           AS X, 
        cal.Y           AS Y, 
        cal.BeginTime   AS BeginTime, 
        cal.EndTime     AS EndTime, 
        avg(m.dbm_ant)  AS avg_dbm_ant, 
        m.ant_id        AS ant_id, 
        avg(m.location) avg_location, 
        count(*)        AS count, 
        m.visit 
 FROM   calibration cal 
        LEFT join calibration_data m
          ON m.visit >= cal.BeginTime 
 WHERE m.visit <= cal.EndTime 
 GROUP  BY cal.X, 
           cal.Y, 
           cal.BeginTime, 
           cal. BeaconId, 
           m.ant_id,
           m.macHash,
           m.visit; 

That's all I am thinking off... 这就是我所想的一切......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM