我的 SQL - 尝试优化查询返回更多行

Question

I have one query which I inherit from my previous collegue, but I need to optimize it.我有一个从我以前的同事那里继承的查询，但我需要对其进行优化。

This query returns 72 rows.此查询返回 72 行。

SELECT  id, contract_no, customer, address, cm_mac, aps
    FROM  
    (
        SELECT  *
            from  new_installed_devices
            where  insert4date >='2018-10-28'
              AND  insert4date <='2018-10-28'
              AND  install_mark<2
    ) as d1
    left join  
    (
        SELECT  *
            from  
            (
                SELECT  contract_no AS c_no, cm_mac AS c_mc, MIN(tstamp) as time2,
                        sum(1) as aps
                    from  devices_change
                    where  contract_no in (
                        SELECT  distinct(contract_no)
                            from  devices_change
                            where  tstamp >= '2018-10-28 06:59:59'
                              AND  tstamp <= '2018-10-29 07:00:00'
                          )
                    group by  contract_no, cm_mac 
            ) as mtmbl
            where  mtmbl.time2 >= '2018-10-28 06:59:59'
              and  mtmbl.time2 <= '2018-10-29 07:00:00' 
    ) as tmp  ON d1.contract_no=tmp.c_no
    where  aps>0
    group by  contract_no, customer, address, cm_mac;

It takes 20 seconds to execute.执行需要 20 秒。 I re-write it, trying to optimize it but in that case I have 75 rows (3 additional rows are returned), but result is presented in 2 seconds.我重写它，试图优化它，但在这种情况下，我有 75 行（返回 3 行额外的行），但结果在 2 秒内显示。

I have done like this (only difference is in one sub query):我已经这样做了（唯一的区别是在一个子查询中）：

SELECT  id, contract_no, customer, address, cm_mac, aps
    FROM  
    (
        SELECT  *
            from  new_installed_devices
            where  insert4date >='2018-10-28'
              AND  insert4date <='2018-10-28'
              AND  install_mark<2
    ) as d1
    left join  
    (
        SELECT  *
            from  
            (
                SELECT distinct
                        (contract_no) AS c_no,
                        cm_mac AS c_mc, MIN(tstamp) as time2,
                        sum(1) as aps
                    from  devices_change
                    where  tstamp >= '2018-10-28 06:59:59'
                      AND  tstamp <= '2018-10-29 07:00:00'
                    group by  contract_no, cm_mac 
            ) as mtmbl
            where  mtmbl.time2 >= '2018-10-28 06:59:59'
              and  mtmbl.time2 <= '2018-10-29 07:00:00' 
    ) as tmp  ON d1.contract_no=tmp.c_no
    where  aps>0
    group by  contract_no, customer, address, cm_mac;

Like you see I did not change a lot in my case but still I am getting more rows that it should be in result.就像您看到的那样，在我的情况下，我并没有太大变化，但我仍然得到了更多的行，它应该是结果。 Can someone please tell me the cause why my second query does not return completely correct result.有人可以告诉我为什么我的第二个查询没有返回完全正确的结果。 I tried many things to optimize but without a success.我尝试了很多东西来优化但没有成功。 Thanks a lot!!!非常感谢！！！

Answer 1

Don't use SELECT * when you don't need all the columns.当您不需要所有列时，不要使用SELECT * 。 It looks like contract_no is the only column needed from dl , hence from new_installed_devices .看起来contract_no是dl唯一需要的列，因此来自new_installed_devices 。
Is there some reason for testing insert4date for equality in that weird way?有什么理由以这种奇怪的方式测试insert4date的相等性吗？
Recommend INDEX(insert4date, install_mark, dl) (in that order)推荐INDEX(insert4date, install_mark, dl) （按顺序）
Try to avoid the construct IN ( SELECT ... ) .尽量避免构造IN ( SELECT ... ) 。 Usually it is better to use EXISTS or LEFT JOIN .通常最好使用EXISTS或LEFT JOIN 。
Don't say DISTINCT(contract_no), ... -- DISTINCT is not a function;别说DISTINCT(contract_no), ... -- DISTINCT不是函数； it's effect applies to the entire set of expressions.它的效果适用于整个表达式集。 Get rid of DISTINCT since the GROUP BY has that effect.摆脱DISTINCT因为GROUP BY有这种效果。
Recommend INDEX(contract_no, cm_max, tstamp) (in that order)推荐INDEX(contract_no, cm_max, tstamp) （ INDEX(contract_no, cm_max, tstamp)顺序）
The test on mtmbl.time2 is redundant since `MIN(tstamp) is already limited to that (1 day + 2 second) time range. mtmbl.time2 上的测试是多余的，因为`MIN(tstamp) 已经被限制在那个（1 天 + 2 秒）时间范围内。
Please provide SHOW CREATE TABLE .请提供SHOW CREATE TABLE 。

Answer 2

You can replace the first subquery in the FROM clause with a direct reference to the table new_installed_devices , with some conditions in the WHERE clause.您可以将 FROM 子句中的第一个子查询替换为对表new_installed_devices的直接引用，以及 WHERE 子句中的某些条件。 In older versions, MySQL doesn't handle subqueries very well, so try to avoid them in the FROM clause (especially if you have more than 1 or 2 of them).在旧版本中，MySQL 不能很好地处理子查询，所以尽量避免在 FROM 子句中使用它们（特别是如果你有超过 1 或 2 个）。
The range conditions for mtmbl.time2 can be folded into the subquery's HAVING clause, to make sure you filter that data as quickly as possible, without creating a large temp table with that subquery. mtmbl.time2的范围条件可以折叠到子查询的 HAVING 子句中，以确保您尽快过滤该数据，而无需使用该子查询创建大型临时表。
Can you provide the SHOW CREATE TABLE of these tables and the EXPLAIN for the query?你能提供这些表的 SHOW CREATE TABLE 和查询的 EXPLAIN 吗？ It can be helpful.它可能会有所帮助。

When guessing the order MySQL will choose here, you can try to add these indexes and run the following query, to see if it works better.在猜测 MySQL 会在此处选择的顺序时，您可以尝试添加这些索引并运行以下查询，看看是否效果更好。 I applied the recommendations above to the query here below (hope my guesses about columns origins were correct, otherwise please fix everything accordingly):我将上面的建议应用于下面的查询（希望我对列来源的猜测是正确的，否则请相应地修复所有内容）：

ALTER TABLE `devices_change` ADD INDEX `devices_change_idx_no_mac_tstamp` (`contract_no`,`cm_mac`,`tstamp`);
ALTER TABLE `devices_change` ADD INDEX `devices_change_idx_tstamp_no` (`tstamp`,`contract_no`);
ALTER TABLE `new_installed_devices` ADD INDEX `new_installed_device_idx_no_insert4date` (`contract_no`,`insert4date`);

The query:查询：

SELECT
        new_installed_devices.id,
        new_installed_devices.contract_no,
        new_installed_devices.customer,
        new_installed_devices.address,
        new_installed_devices.cm_mac,
        new_installed_devices.aps 
    FROM
        new_installed_devices AS d1 
    LEFT JOIN
        (
            SELECT
                * 
            FROM
                (SELECT
                    devices_change.contract_no AS c_no,
                    devices_change.cm_mac AS c_mc,
                    MIN(devices_change.tstamp) AS time2,
                    sum(1) AS aps 
                FROM
                    devices_change 
                WHERE
                    devices_change.contract_no IN (
                        SELECT
                            DISTINCT (devices_change.contract_no) 
                        FROM
                            devices_change 
                        WHERE
                            devices_change.tstamp >= '2018-10-28 06:59:59' 
                            AND devices_change.tstamp <= '2018-10-29 07:00:00'
                    ) 
                GROUP BY
                    devices_change.contract_no,
                    devices_change.cm_mac 
                HAVING
                    devices_change.time2 >= '2018-10-28 06:59:59' 
                    AND devices_change.time2 <= '2018-10-29 07:00:00' 
                ORDER BY
                    NULL) AS mtmbl) AS tmp 
                    ON d1.contract_no = tmp.c_no 
            WHERE
                aps > 0 
                AND d1.insert4date >= '2018-10-28' 
                AND d1.insert4date <= '2018-10-28' 
                AND d1.install_mark < 2 
            GROUP BY
                new_installed_devices.contract_no,
                new_installed_devices.customer,
                new_installed_devices.address,
                new_installed_devices.cm_mac 
            ORDER BY
                NULL

我的 SQL - 尝试优化查询返回更多行

问题描述

2 个解决方案

解决方案1
0 2018-11-02 05:55:21

解决方案2
0 2018-11-11 14:05:56

我的 SQL - 尝试优化查询返回更多行

问题描述

2 个解决方案

解决方案1 0 2018-11-02 05:55:21

解决方案2 0 2018-11-11 14:05:56

解决方案1
0 2018-11-02 05:55:21

解决方案2
0 2018-11-11 14:05:56