简体   繁体   中英

MySQL Distinct Poor performance

I have a Distinct Select statement with multiple left joins that is performing poorly when my where clause is large. Below is my statement

SELECT  DISTINCT u.*, ri.id as reg_id, d.id as dist_id
    FROM  users u
    LEFT JOIN  earned_points ep ON u.id = ep.user_id
    LEFT JOIN  distributors d ON d.id = ep.distributor_id
      OR  d.id = u.distributor_id
      OR  d.id = u.additional_distributor_id
    LEFT JOIN  registration_items_users riu ON u.id = riu.user_id
      AND  riu.distributor_id = d.id
      AND  riu.registration_item_id = 21
    LEFT JOIN  registration_items ri ON riu.registration_item_id = ri.id
    WHERE  d.id IN (201,281,321,631,901,971,1211,1601,1611,1621,
               1631,1641,1651,1661,1671,1681,1691,1701,1711,1721,1731,
               1741,1751,1761,1771,1781,2281,2291,2401,2781,2801,2931 );

The Explain for this query is below: select_explain

This query take around 4 seconds to complete. If I reduce the where down to one id then it speeds up to about 170ms.

Would appreciate any suggestion on how to make this query quicker.

Thank you

I was able to come up with a solution based on Rick James(accepted answer) suggestion. using Union and getting rid of the Left Joins and Distinct did the trick. This new query take around 200ms compared to the 4 second version above.

(SELECT  u.*, 
   (SELECT riu.registration_item_id 
       FROM registration_items_users riu 
       WHERE riu.user_id = u.id 
           AND riu.distributor_id = d.id 
           AND riu.registration_item_id = 21) as reg_id,
   d.id as dist_id
   FROM users u
   JOIN earned_points ep ON u.id = ep.user_id
   JOIN distributors d ON d.id = ep.distributor_id
       WHERE d.id IN (201,281,321,631,901,971,1211,1601,1611,1621,
            1631,1641,1651,1661,1671,1681,1691,1701,1711,1721,1731,
            1741,1751,1761,1771,1781,2281,2291,2401,2781,2801,2931))
   UNION
(SELECT  u.*, 
   (SELECT riu.registration_item_id 
       FROM registration_items_users riu 
       WHERE riu.user_id = u.id 
           AND riu.distributor_id = d.id 
           AND riu.registration_item_id = 21) as reg_id,
   d.id as dist_id
   FROM users u
   JOIN distributors d ON d.id = u.distributor_id
       WHERE d.id IN (201,281,321,631,901,971,1211,1601,1611,1621,
            1631,1641,1651,1661,1671,1681,1691,1701,1711,1721,1731,
            1741,1751,1761,1771,1781,2281,2291,2401,2781,2801,2931))
   UNION
(SELECT  u.*, 
   (SELECT riu.registration_item_id 
       FROM registration_items_users riu 
       WHERE riu.user_id = u.id 
           AND riu.distributor_id = d.id 
           AND riu.registration_item_id = 21) as reg_id,
   d.id as dist_id
   FROM users u
   JOIN distributors d ON d.id = u.additional_distributor_id
       WHERE d.id IN (201,281,321,631,901,971,1211,1601,1611,1621,
            1631,1641,1651,1661,1671,1681,1691,1701,1711,1721,1731,
            1741,1751,1761,1771,1781,2281,2291,2401,2781,2801,2931))

In the EXPLAIN , look at the u line. It is doing a "table scan" of about 6974 rows.

Get rid of LEFT unless the "right" table is optional.

Turn the OR into a UNION ; that is where the indexes are failing you. ( UNION ALL is faster than UNION DISTINCT ; pick whichever one make sense.)

Assuming the LEFTs can be removed, and assuming the DISTINCT can be moved from SELECT to UNION :

SELECT  u.*, ri.id as reg_id, d.id as dist_id
    FROM  users u
    JOIN  earned_points ep ON u.id = ep.user_id  -- ep needed only for this
    JOIN  distributors d ON d.id = ep.distributor_id  -- This one line differs
    JOIN  registration_items_users riu ON u.id = riu.user_id
      AND  riu.distributor_id = d.id
      AND  riu.registration_item_id = 21
    JOIN  registration_items ri ON riu.registration_item_id = ri.id
    WHERE  d.id IN (201,281,321,631,901,971,1211,1601,1611,1621,
                1631,1641,1651,1661,1671,1681,1691,1701,1711,1721,1731,
                1741,1751,1761,1771,1781,2281,2291,2401,2781,2801,2931 
                   )
    UNION  DISTINCT 
SELECT  u.*, ri.id as reg_id, d.id as dist_id
    FROM  users u
    JOIN  distributors d ON d.id = u.distributor_id
    JOIN  registration_items_users riu ON u.id = riu.user_id
      AND  riu.distributor_id = d.id
      AND  riu.registration_item_id = 21
    JOIN  registration_items ri ON riu.registration_item_id = ri.id
    WHERE  d.id IN (201,281,321,631,901,971,1211,1601,1611,1621,
                1631,1641,1651,1661,1671,1681,1691,1701,1711,1721,1731,
                1741,1751,1761,1771,1781,2281,2291,2401,2781,2801,2931 
                   )
    UNION  DISTINCT 
SELECT  u.*, ri.id as reg_id, d.id as dist_id
    FROM  users u
    JOIN  distributors d ON d.id = u.additional_distributor_id
    JOIN  registration_items_users riu ON u.id = riu.user_id
      AND  riu.distributor_id = d.id
      AND  riu.registration_item_id = 21
    JOIN  registration_items ri ON riu.registration_item_id = ri.id
    WHERE  d.id IN (201,281,321,631,901,971,1211,1601,1611,1621,
                1631,1641,1651,1661,1671,1681,1691,1701,1711,1721,1731,
                1741,1751,1761,1771,1781,2281,2291,2401,2781,2801,2931 
                   ) ;

It is generally a bad idea to splay an array across columns. That seems to be what is going on with distributors . And this mess may be a result of such.

Edit

Even better would be to pull the ri and rui stuff out of the selects and turn it into a subquery. Here's the gist; I don't have the energy to write it all:

SELECT x.*,
        ( SELECT ... ri and rui stuff ... ) AS reg_id
    FROM (
        --  from above, less the ri and rui stuff:
        SELECT ...
        UNION DISTINCT
        SELECT ...
        UNION DISTINCT
        SELECT ...
         ) AS x;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM