简体   繁体   English

MySQL左外连接很慢

[英]MySQL left outer join is slow

hoping to get some help with this query, I've worked at it for a while now and can't get it any faster: 希望能得到一些关于这个查询的帮助,我已经在它上面工作了一段时间并且无法更快地得到它:

SELECT date, count(id) as 'visits' FROM dates 
LEFT OUTER JOIN visits 
ON (dates.date = DATE(visits.start) and account_id = 40 ) 
WHERE date >= '2010-12-13' AND date <= '2011-1-13' 
GROUP BY date ORDER BY date ASC

That query takes about 8 seconds to run. 该查询大约需要8秒才能运行。 I've added indexes on dates.date, visits.start, visits.account_id and visits.start+visits.account_id and can't get it to run any faster. 我在dates.date,visits.start,visits.account_id和visits.start + visits.account_id上添加了索引,无法让它更快地运行。

Table structure (only showing relevant columns in visit table): 表结构(仅显示访问表中的相关列):

create table visits (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `account_id` int(11) NOT NULL,
    `start` DATETIME NOT NULL,
    `end` DATETIME NULL,
    PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE `dates` (
  `date` date NOT NULL,
  PRIMARY KEY (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

dates table contains all days from 2010-1-1 to 2020-1-1 (~3k rows). 日期表包含2010-1-1至2020-1-1(~3k行)的所有日期。 visits table contains about 400k rows dating from 2010-6-1 to yesterday. 访问表包含从2010年6月1日到昨天的约400k行。 I'm using the date table so the join will return 0 visits for days there were no visits. 我正在使用日期表,因此连接将返回0次访问,没有访问。

Results I want for reference: 结果我想参考:

+------------+--------+
| date       | visits |
+------------+--------+
| 2010-12-13 |    301 |
| 2010-12-14 |    356 |
| 2010-12-15 |    423 |
| 2010-12-16 |    332 |
| 2010-12-17 |    346 |
| 2010-12-18 |    226 |
| 2010-12-19 |    213 |
| 2010-12-20 |    311 |
| 2010-12-21 |    273 |
| 2010-12-22 |    286 |
| 2010-12-23 |    241 |
| 2010-12-24 |    149 |
| 2010-12-25 |    102 |
| 2010-12-26 |    174 |
| 2010-12-27 |    258 |
| 2010-12-28 |    348 |
| 2010-12-29 |    392 |
| 2010-12-30 |    395 |
| 2010-12-31 |    278 |
| 2011-01-01 |    241 |
| 2011-01-02 |    295 |
| 2011-01-03 |    369 |
| 2011-01-04 |    438 |
| 2011-01-05 |    393 |
| 2011-01-06 |    368 |
| 2011-01-07 |    435 |
| 2011-01-08 |    313 |
| 2011-01-09 |    250 |
| 2011-01-10 |    345 |
| 2011-01-11 |    387 |
| 2011-01-12 |      0 |
| 2011-01-13 |      0 |
+------------+--------+

Thanks in advance for any help! 在此先感谢您的帮助!

Your problem is here: 你的问题在这里:

ON (dates.date = DATE(visits.start) and account_id = 40 ) 

Because you are using the DATE function on visits.start , MySQL is unable to use an index for the join. 因为您在visits.start上使用DATE函数, visits.start MySQL无法使用索引进行连接。

Probably the best solution would be to add a start_date and end_date column to the dates table and index those columns. 可能最好的解决方案是将start_dateend_date列添加到dates表并索引这些列。 So for a row with a date of 2011-01-01, the start date would be 2011-01-01 00:00:00 and the end date would be 2011-01-01 23:59:59. 因此,对于日期为2011-01-01的行,开始日期为2011-01-01 00:00:00,结束日期为2011-01-01 23:59:59。

Then you can join directly to the dates table like so: 然后您可以直接加入日期表,如下所示:

SELECT date, count(id) as 'visits' FROM dates 
LEFT OUTER JOIN visits 
ON (visits.start BETWEEN dates.start_date AND dates.end_date and account_id = 40 ) 
WHERE date >= '2010-12-13' AND date <= '2011-1-13' 
GROUP BY date ORDER BY date ASC

Another option would be to store the date and time parts separately on the visits table, and join using just the date part. 另一种选择是在访问表上单独存储日期和时间部分,并仅使用日期部分进行连接。

I think it is mainly slow because of the DATE() function. 由于DATE()函数,我认为它主要是慢的。 You could add a date column to Visits that stores the whole date and write a trigger to automatically update it when a Visit is inserted or its datetime is updated. 您可以向存储整个日期的Visits添加日期列,并编写触发器以在插入访问或更新其日期时自动更新它。 That will allow MySQL to make better use of the indexes that are used in the join. 这将允许MySQL更好地利用连接中使用的索引。

How about something like that: outer join on the result of select from eumiro? 这样的事情:从eumiro中选择结果的外连接?

SELECT date, v.visits as 'visits' FROM dates 
LEFT OUTER JOIN (SELECT DATE(start) as dt, count(id) as 'visits'
FROM visits 
WHERE account_id = 40
AND date BETWEEN '2010-12-13' AND '2011-01-13' 
GROUP BY DATE(start)
ORDER BY 1)
v
ON (dates.date = v.dt ) 
WHERE date >= '2010-12-13' AND date <= '2011-1-13' 

Edit: edited SQL Edit: another option - inline select, something like that: 编辑:编辑的SQL编辑:另一个选项 - 内联选择,类似的东西:

SELECT date, (select count(*)  as 'visits' 
FROM  from visits 
where date = DATE(visits.start) and account_id = 40 ) 
) from dates
WHERE date >= '2010-12-13' AND date <= '2011-1-13' 
ORDER BY date ASC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM