简体   繁体   中英

My SQL - trying to optimize Query returns more rows

I have one query which I inherit from my previous collegue, but I need to optimize it.

This query returns 72 rows.

SELECT  id, contract_no, customer, address, cm_mac, aps
    FROM  
    (
        SELECT  *
            from  new_installed_devices
            where  insert4date >='2018-10-28'
              AND  insert4date <='2018-10-28'
              AND  install_mark<2
    ) as d1
    left join  
    (
        SELECT  *
            from  
            (
                SELECT  contract_no AS c_no, cm_mac AS c_mc, MIN(tstamp) as time2,
                        sum(1) as aps
                    from  devices_change
                    where  contract_no in (
                        SELECT  distinct(contract_no)
                            from  devices_change
                            where  tstamp >= '2018-10-28 06:59:59'
                              AND  tstamp <= '2018-10-29 07:00:00'
                          )
                    group by  contract_no, cm_mac 
            ) as mtmbl
            where  mtmbl.time2 >= '2018-10-28 06:59:59'
              and  mtmbl.time2 <= '2018-10-29 07:00:00' 
    ) as tmp  ON d1.contract_no=tmp.c_no
    where  aps>0
    group by  contract_no, customer, address, cm_mac;

It takes 20 seconds to execute. I re-write it, trying to optimize it but in that case I have 75 rows (3 additional rows are returned), but result is presented in 2 seconds.

I have done like this (only difference is in one sub query):

SELECT  id, contract_no, customer, address, cm_mac, aps
    FROM  
    (
        SELECT  *
            from  new_installed_devices
            where  insert4date >='2018-10-28'
              AND  insert4date <='2018-10-28'
              AND  install_mark<2
    ) as d1
    left join  
    (
        SELECT  *
            from  
            (
                SELECT distinct
                        (contract_no) AS c_no,
                        cm_mac AS c_mc, MIN(tstamp) as time2,
                        sum(1) as aps
                    from  devices_change
                    where  tstamp >= '2018-10-28 06:59:59'
                      AND  tstamp <= '2018-10-29 07:00:00'
                    group by  contract_no, cm_mac 
            ) as mtmbl
            where  mtmbl.time2 >= '2018-10-28 06:59:59'
              and  mtmbl.time2 <= '2018-10-29 07:00:00' 
    ) as tmp  ON d1.contract_no=tmp.c_no
    where  aps>0
    group by  contract_no, customer, address, cm_mac;

Like you see I did not change a lot in my case but still I am getting more rows that it should be in result. Can someone please tell me the cause why my second query does not return completely correct result. I tried many things to optimize but without a success. Thanks a lot!!!

  • Don't use SELECT * when you don't need all the columns. It looks like contract_no is the only column needed from dl , hence from new_installed_devices .
  • Is there some reason for testing insert4date for equality in that weird way?
  • Recommend INDEX(insert4date, install_mark, dl) (in that order)
  • Try to avoid the construct IN ( SELECT ... ) . Usually it is better to use EXISTS or LEFT JOIN .
  • Don't say DISTINCT(contract_no), ... -- DISTINCT is not a function; it's effect applies to the entire set of expressions. Get rid of DISTINCT since the GROUP BY has that effect.
  • Recommend INDEX(contract_no, cm_max, tstamp) (in that order)
  • The test on mtmbl.time2 is redundant since `MIN(tstamp) is already limited to that (1 day + 2 second) time range.
  • Please provide SHOW CREATE TABLE .
  • You can replace the first subquery in the FROM clause with a direct reference to the table new_installed_devices , with some conditions in the WHERE clause. In older versions, MySQL doesn't handle subqueries very well, so try to avoid them in the FROM clause (especially if you have more than 1 or 2 of them).
  • The range conditions for mtmbl.time2 can be folded into the subquery's HAVING clause, to make sure you filter that data as quickly as possible, without creating a large temp table with that subquery.
  • Can you provide the SHOW CREATE TABLE of these tables and the EXPLAIN for the query? It can be helpful.

When guessing the order MySQL will choose here, you can try to add these indexes and run the following query, to see if it works better. I applied the recommendations above to the query here below (hope my guesses about columns origins were correct, otherwise please fix everything accordingly):

ALTER TABLE `devices_change` ADD INDEX `devices_change_idx_no_mac_tstamp` (`contract_no`,`cm_mac`,`tstamp`);
ALTER TABLE `devices_change` ADD INDEX `devices_change_idx_tstamp_no` (`tstamp`,`contract_no`);
ALTER TABLE `new_installed_devices` ADD INDEX `new_installed_device_idx_no_insert4date` (`contract_no`,`insert4date`);

The query:

SELECT
        new_installed_devices.id,
        new_installed_devices.contract_no,
        new_installed_devices.customer,
        new_installed_devices.address,
        new_installed_devices.cm_mac,
        new_installed_devices.aps 
    FROM
        new_installed_devices AS d1 
    LEFT JOIN
        (
            SELECT
                * 
            FROM
                (SELECT
                    devices_change.contract_no AS c_no,
                    devices_change.cm_mac AS c_mc,
                    MIN(devices_change.tstamp) AS time2,
                    sum(1) AS aps 
                FROM
                    devices_change 
                WHERE
                    devices_change.contract_no IN (
                        SELECT
                            DISTINCT (devices_change.contract_no) 
                        FROM
                            devices_change 
                        WHERE
                            devices_change.tstamp >= '2018-10-28 06:59:59' 
                            AND devices_change.tstamp <= '2018-10-29 07:00:00'
                    ) 
                GROUP BY
                    devices_change.contract_no,
                    devices_change.cm_mac 
                HAVING
                    devices_change.time2 >= '2018-10-28 06:59:59' 
                    AND devices_change.time2 <= '2018-10-29 07:00:00' 
                ORDER BY
                    NULL) AS mtmbl) AS tmp 
                    ON d1.contract_no = tmp.c_no 
            WHERE
                aps > 0 
                AND d1.insert4date >= '2018-10-28' 
                AND d1.insert4date <= '2018-10-28' 
                AND d1.install_mark < 2 
            GROUP BY
                new_installed_devices.contract_no,
                new_installed_devices.customer,
                new_installed_devices.address,
                new_installed_devices.cm_mac 
            ORDER BY
                NULL

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM