[英]My SQL - trying to optimize Query returns more rows
I have one query which I inherit from my previous collegue, but I need to optimize it.我有一个从我以前的同事那里继承的查询,但我需要对其进行优化。
This query returns 72 rows.此查询返回 72 行。
SELECT id, contract_no, customer, address, cm_mac, aps
FROM
(
SELECT *
from new_installed_devices
where insert4date >='2018-10-28'
AND insert4date <='2018-10-28'
AND install_mark<2
) as d1
left join
(
SELECT *
from
(
SELECT contract_no AS c_no, cm_mac AS c_mc, MIN(tstamp) as time2,
sum(1) as aps
from devices_change
where contract_no in (
SELECT distinct(contract_no)
from devices_change
where tstamp >= '2018-10-28 06:59:59'
AND tstamp <= '2018-10-29 07:00:00'
)
group by contract_no, cm_mac
) as mtmbl
where mtmbl.time2 >= '2018-10-28 06:59:59'
and mtmbl.time2 <= '2018-10-29 07:00:00'
) as tmp ON d1.contract_no=tmp.c_no
where aps>0
group by contract_no, customer, address, cm_mac;
It takes 20 seconds to execute.执行需要 20 秒。 I re-write it, trying to optimize it but in that case I have 75 rows (3 additional rows are returned), but result is presented in 2 seconds.我重写它,试图优化它,但在这种情况下,我有 75 行(返回 3 行额外的行),但结果在 2 秒内显示。
I have done like this (only difference is in one sub query):我已经这样做了(唯一的区别是在一个子查询中):
SELECT id, contract_no, customer, address, cm_mac, aps
FROM
(
SELECT *
from new_installed_devices
where insert4date >='2018-10-28'
AND insert4date <='2018-10-28'
AND install_mark<2
) as d1
left join
(
SELECT *
from
(
SELECT distinct
(contract_no) AS c_no,
cm_mac AS c_mc, MIN(tstamp) as time2,
sum(1) as aps
from devices_change
where tstamp >= '2018-10-28 06:59:59'
AND tstamp <= '2018-10-29 07:00:00'
group by contract_no, cm_mac
) as mtmbl
where mtmbl.time2 >= '2018-10-28 06:59:59'
and mtmbl.time2 <= '2018-10-29 07:00:00'
) as tmp ON d1.contract_no=tmp.c_no
where aps>0
group by contract_no, customer, address, cm_mac;
Like you see I did not change a lot in my case but still I am getting more rows that it should be in result.就像您看到的那样,在我的情况下,我并没有太大变化,但我仍然得到了更多的行,它应该是结果。 Can someone please tell me the cause why my second query does not return completely correct result.有人可以告诉我为什么我的第二个查询没有返回完全正确的结果。 I tried many things to optimize but without a success.我尝试了很多东西来优化但没有成功。 Thanks a lot!!!非常感谢!!!
SELECT *
when you don't need all the columns.当您不需要所有列时,不要使用SELECT *
。 It looks like contract_no
is the only column needed from dl
, hence from new_installed_devices
.看起来contract_no
是dl
唯一需要的列,因此来自new_installed_devices
。insert4date
for equality in that weird way?有什么理由以这种奇怪的方式测试insert4date
的相等性吗?INDEX(insert4date, install_mark, dl)
(in that order)推荐INDEX(insert4date, install_mark, dl)
(按顺序)IN ( SELECT ... )
.尽量避免构造IN ( SELECT ... )
。 Usually it is better to use EXISTS
or LEFT JOIN
.通常最好使用EXISTS
或LEFT JOIN
。DISTINCT(contract_no), ...
-- DISTINCT
is not a function;别说DISTINCT(contract_no), ...
-- DISTINCT
不是函数; it's effect applies to the entire set of expressions.它的效果适用于整个表达式集。 Get rid of DISTINCT
since the GROUP BY
has that effect.摆脱DISTINCT
因为GROUP BY
有这种效果。INDEX(contract_no, cm_max, tstamp)
(in that order)推荐INDEX(contract_no, cm_max, tstamp)
( INDEX(contract_no, cm_max, tstamp)
顺序)SHOW CREATE TABLE
.请提供SHOW CREATE TABLE
。new_installed_devices
, with some conditions in the WHERE clause.您可以将 FROM 子句中的第一个子查询替换为对表new_installed_devices
的直接引用,以及 WHERE 子句中的某些条件。 In older versions, MySQL doesn't handle subqueries very well, so try to avoid them in the FROM clause (especially if you have more than 1 or 2 of them).在旧版本中,MySQL 不能很好地处理子查询,所以尽量避免在 FROM 子句中使用它们(特别是如果你有超过 1 或 2 个)。mtmbl.time2
can be folded into the subquery's HAVING clause, to make sure you filter that data as quickly as possible, without creating a large temp table with that subquery. mtmbl.time2
的范围条件可以折叠到子查询的 HAVING 子句中,以确保您尽快过滤该数据,而无需使用该子查询创建大型临时表。When guessing the order MySQL will choose here, you can try to add these indexes and run the following query, to see if it works better.在猜测 MySQL 会在此处选择的顺序时,您可以尝试添加这些索引并运行以下查询,看看是否效果更好。 I applied the recommendations above to the query here below (hope my guesses about columns origins were correct, otherwise please fix everything accordingly):我将上面的建议应用于下面的查询(希望我对列来源的猜测是正确的,否则请相应地修复所有内容):
ALTER TABLE `devices_change` ADD INDEX `devices_change_idx_no_mac_tstamp` (`contract_no`,`cm_mac`,`tstamp`);
ALTER TABLE `devices_change` ADD INDEX `devices_change_idx_tstamp_no` (`tstamp`,`contract_no`);
ALTER TABLE `new_installed_devices` ADD INDEX `new_installed_device_idx_no_insert4date` (`contract_no`,`insert4date`);
The query:查询:
SELECT
new_installed_devices.id,
new_installed_devices.contract_no,
new_installed_devices.customer,
new_installed_devices.address,
new_installed_devices.cm_mac,
new_installed_devices.aps
FROM
new_installed_devices AS d1
LEFT JOIN
(
SELECT
*
FROM
(SELECT
devices_change.contract_no AS c_no,
devices_change.cm_mac AS c_mc,
MIN(devices_change.tstamp) AS time2,
sum(1) AS aps
FROM
devices_change
WHERE
devices_change.contract_no IN (
SELECT
DISTINCT (devices_change.contract_no)
FROM
devices_change
WHERE
devices_change.tstamp >= '2018-10-28 06:59:59'
AND devices_change.tstamp <= '2018-10-29 07:00:00'
)
GROUP BY
devices_change.contract_no,
devices_change.cm_mac
HAVING
devices_change.time2 >= '2018-10-28 06:59:59'
AND devices_change.time2 <= '2018-10-29 07:00:00'
ORDER BY
NULL) AS mtmbl) AS tmp
ON d1.contract_no = tmp.c_no
WHERE
aps > 0
AND d1.insert4date >= '2018-10-28'
AND d1.insert4date <= '2018-10-28'
AND d1.install_mark < 2
GROUP BY
new_installed_devices.contract_no,
new_installed_devices.customer,
new_installed_devices.address,
new_installed_devices.cm_mac
ORDER BY
NULL
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.