简体   繁体   中英

Painfully slow MySql query when joining two tables on primary/foreign keys

We use the Ahoy ruby library for tracking user visits and events. In order to provide feedback to users, we periodically run counts on certain events and visits.

The two tables are relatively large, but not huge. Visits is 6MM+ rows and Events are 23MM+ rows.

Below is a sample query, which takes 80s to run:

SELECT COUNT(*) 
FROM `ahoy_events` 
INNER JOIN `visits`  ON `visits`.`id` = `ahoy_events`.`visit_id` 
WHERE `ahoy_events`.`event_target_id` = 8471 
  AND `ahoy_events`.`event_target_type` = 'Project' 
  AND visits.entity_id = 668 
  AND (`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'User')

And here is the explain for that query:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: visits
   partitions: NULL
         type: ref
possible_keys: PRIMARY,index_visits_on_entity_id,index_visits_on_entity_id_and_user_type,index_visits_on_entity_id_and_started_at,index_visits_on_entity_id_and_user_id_and_user_type,index_visits_on_entity_id_user_id_user_type_started_at
          key: index_visits_on_entity_id_user_id_user_type_started_at
      key_len: 5
          ref: const
         rows: 1567140
     filtered: 19.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: ahoy_events
   partitions: NULL
         type: ref
possible_keys: index_ahoy_events_on_visit_id,index_ahoy_events_on_event_target_id_and_event_target_type
          key: index_ahoy_events_on_visit_id
      key_len: 17
          ref: givecorpssite.visits.id
         rows: 2
     filtered: 11.47
        Extra: Using where

When I run just a count on the individual tables, each runs in 200ms to 600ms, ie:

SELECT count(*) FROM `ahoy_events` WHERE `ahoy_events`.`event_target_id` = 8471 AND `ahoy_events`.`event_target_type` = 'Project'

and

SELECT count(*) FROM `visits` where visits.entity_id = 668 AND (`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'Donor')

But joining them on the primary/foreign key, causes the query to take 80s+

BTW, the keys (visit_id, and the id column on visits) are UUIDs and are BINARY(16) columns.

Am I wrong to believe this query should not be so slow?

Since it is unclear exactly whether the OR selection condition is causing an issue, and you are not really seeking row-level data in your results, you could instead try conditional aggregation of this sort:

SELECT COUNT(IF(`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'User',1,NULL) 
FROM `ahoy_events` 
INNER JOIN `visits`  ON `visits`.`id` = `ahoy_events`.`visit_id` 
WHERE `ahoy_events`.`event_target_id` = 8471 
  AND `ahoy_events`.`event_target_type` = 'Project' 
  AND visits.entity_id = 668 
;

COUNT ignores null values; alternatively, SUM(IF(visits.user_type IS NULL OR visits.user_type = 'User',1,0)) is a little clearer and gets the same result (though it could in theory be a little more costly performance-wise).

In this query, you'll be processing more rows without the condition reducing them, but it can end up "cheaper" to scan the larger results, than scan the table for a smaller set of results.

Coverings Indexes:

visits:  INDEX(entity_id, user_type, id)  -- in this order
ahoy_events:  INDEX(event_target_id, event_target_type, visit_id)

By being covering, there may be less I/O. (I/O is the slowest part of a query.)

There is some chance that the following will run faster:

SELECT  
    (
        SELECT  COUNT(*)
            FROM  `ahoy_events` AS e
            INNER JOIN  `visits` AS v  ON v.`id` = e.`visit_id`
            WHERE  e.`event_target_id` = 8471
              AND  e.`event_target_type` = 'Project'
              AND  visits.entity_id = 668
              AND  v.`user_type` IS NULL 
    ) + 
    (
        SELECT  COUNT(*)
            FROM  `ahoy_events` AS e
            INNER JOIN  `visits` AS v  ON v.`id` = e.`visit_id`
            WHERE  e.`event_target_id` = 8471
              AND  e.`event_target_type` = 'Project'
              AND  visits.entity_id = 668
              AND  v.`user_type` = 'User' 
    );

It needs the same indexes I suggested above.

The rationale here is to avoid the OR . (Indexes usually cannot be used with OR .)

If you want to discuss further, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM