简体   繁体   English

在主/外键上连接两个表时,MySQL的查询速度缓慢

[英]Painfully slow MySql query when joining two tables on primary/foreign keys

We use the Ahoy ruby library for tracking user visits and events. 我们使用Ahoy红宝石库来跟踪用户访问和事件。 In order to provide feedback to users, we periodically run counts on certain events and visits. 为了向用户提供反馈,我们会定期对某些事件和访问进行计数。

The two tables are relatively large, but not huge. 这两个表相对较大,但并不庞大。 Visits is 6MM+ rows and Events are 23MM+ rows. 访问次数为6MM +行,事件为23MM +行。

Below is a sample query, which takes 80s to run: 下面是一个示例查询,需要80秒钟才能运行:

SELECT COUNT(*) 
FROM `ahoy_events` 
INNER JOIN `visits`  ON `visits`.`id` = `ahoy_events`.`visit_id` 
WHERE `ahoy_events`.`event_target_id` = 8471 
  AND `ahoy_events`.`event_target_type` = 'Project' 
  AND visits.entity_id = 668 
  AND (`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'User')

And here is the explain for that query: 这是该查询的解释:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: visits
   partitions: NULL
         type: ref
possible_keys: PRIMARY,index_visits_on_entity_id,index_visits_on_entity_id_and_user_type,index_visits_on_entity_id_and_started_at,index_visits_on_entity_id_and_user_id_and_user_type,index_visits_on_entity_id_user_id_user_type_started_at
          key: index_visits_on_entity_id_user_id_user_type_started_at
      key_len: 5
          ref: const
         rows: 1567140
     filtered: 19.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: ahoy_events
   partitions: NULL
         type: ref
possible_keys: index_ahoy_events_on_visit_id,index_ahoy_events_on_event_target_id_and_event_target_type
          key: index_ahoy_events_on_visit_id
      key_len: 17
          ref: givecorpssite.visits.id
         rows: 2
     filtered: 11.47
        Extra: Using where

When I run just a count on the individual tables, each runs in 200ms to 600ms, ie: 当我仅对单个表进行计数时,每个表的运行时间为200毫秒至600毫秒,即:

SELECT count(*) FROM `ahoy_events` WHERE `ahoy_events`.`event_target_id` = 8471 AND `ahoy_events`.`event_target_type` = 'Project'

and

SELECT count(*) FROM `visits` where visits.entity_id = 668 AND (`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'Donor')

But joining them on the primary/foreign key, causes the query to take 80s+ 但是将它们连接到主/外键上,会使查询花费80s +

BTW, the keys (visit_id, and the id column on visits) are UUIDs and are BINARY(16) columns. 顺便说一句,键(visit_id和访问时的id列)是UUID,并且是BINARY(16)列。

Am I wrong to believe this query should not be so slow? 我相信这个查询不会那么慢是我错了吗?

Since it is unclear exactly whether the OR selection condition is causing an issue, and you are not really seeking row-level data in your results, you could instead try conditional aggregation of this sort: 由于目前尚不清楚OR选择条件是否会导致问题,并且您并没有真正在结果中寻找行级数据,因此您可以尝试这种条件聚合:

SELECT COUNT(IF(`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'User',1,NULL) 
FROM `ahoy_events` 
INNER JOIN `visits`  ON `visits`.`id` = `ahoy_events`.`visit_id` 
WHERE `ahoy_events`.`event_target_id` = 8471 
  AND `ahoy_events`.`event_target_type` = 'Project' 
  AND visits.entity_id = 668 
;

COUNT ignores null values; COUNT忽略空值; alternatively, SUM(IF(visits.user_type IS NULL OR visits.user_type = 'User',1,0)) is a little clearer and gets the same result (though it could in theory be a little more costly performance-wise). 或者, SUM(IF(visits.user_type IS NULL OR visits.user_type = 'User',1,0))更加清晰,并且得到相同的结果(尽管从理论上讲,在性能方面可能会稍微高一些)。

In this query, you'll be processing more rows without the condition reducing them, but it can end up "cheaper" to scan the larger results, than scan the table for a smaller set of results. 在此查询中,您将处理更多的行而不会减少条件,但与扫描表中较小的结果集相比,它可能最终“便宜”以扫描较大的结果。

Coverings Indexes: 覆盖指数:

visits:  INDEX(entity_id, user_type, id)  -- in this order
ahoy_events:  INDEX(event_target_id, event_target_type, visit_id)

By being covering, there may be less I/O. 通过覆盖,可能会减少I / O。 (I/O is the slowest part of a query.) (I / O是查询中最慢的部分。)

There is some chance that the following will run faster: 有可能以下程序会运行得更快:

SELECT  
    (
        SELECT  COUNT(*)
            FROM  `ahoy_events` AS e
            INNER JOIN  `visits` AS v  ON v.`id` = e.`visit_id`
            WHERE  e.`event_target_id` = 8471
              AND  e.`event_target_type` = 'Project'
              AND  visits.entity_id = 668
              AND  v.`user_type` IS NULL 
    ) + 
    (
        SELECT  COUNT(*)
            FROM  `ahoy_events` AS e
            INNER JOIN  `visits` AS v  ON v.`id` = e.`visit_id`
            WHERE  e.`event_target_id` = 8471
              AND  e.`event_target_type` = 'Project'
              AND  visits.entity_id = 668
              AND  v.`user_type` = 'User' 
    );

It needs the same indexes I suggested above. 它需要我上面建议的相同索引。

The rationale here is to avoid the OR . 这里的理由是避免OR (Indexes usually cannot be used with OR .) (索引通常不能与OR一起使用。)

If you want to discuss further, please provide SHOW CREATE TABLE and EXPLAIN SELECT ... 如果您想进一步讨论,请提供SHOW CREATE TABLEEXPLAIN SELECT ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM