[英]how can this query be optimized (n+1 and dense_rank after)
How can I optimize this query如何优化此查询
WITH stats AS (SELECT a.IntegratorSalesAssociateID,
a.AgentName,
(
SELECT COUNT(*)
FROM properties AS p
WHERE a.IntegratorSalesAssociateID = p.IntegratorSalesAssociateID
AND p.TransactionType = '2'
AND MONTH(p.OrigListingDate) = MONTH(CURRENT_DATE)
AND YEAR(p.OrigListingDate) = YEAR(CURRENT_DATE)
) AS properties_this_month
FROM agents AS a)
SELECT stats.*,
DENSE_RANK() over (ORDER BY stats.properties_this_month DESC) AS 'rank'
from stats
I think maybe if I join the two tables and group them somehow, it would preform much better, currently it runs for 17.5 seconds, oddly, adding the dense_rank does not effect performance at all.我想也许如果我加入这两个表并以某种方式对它们进行分组,它会执行得更好,目前它运行 17.5 秒,奇怪的是,添加dense_rank 根本不会影响性能。
Relevant table structure相关表结构
CREATE TABLE `agents`
(
`IntegratorSalesAssociateID` varchar(15) COLLATE utf8mb4_unicode_ci NOT NULL,
`AgentName` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL
) ENGINE = InnoDB
DEFAULT CHARSET = utf8mb4
COLLATE = utf8mb4_unicode_ci;
CREATE TABLE `properties`
(
`id` bigint(20) UNSIGNED NOT NULL,
`IntegratorSalesAssociateID` varchar(13) COLLATE utf8mb4_unicode_ci NOT NULL,
`TransactionType` tinyint(4) NOT NULL,
`OrigListingDate` date DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL
) ENGINE = InnoDB
DEFAULT CHARSET = utf8mb4
COLLATE = utf8mb4_unicode_ci;
You can try this:你可以试试这个:
;WITH stats AS
(
SELECT
p.IntegratorSalesAssociateID
, COUNT(*) AS properties_this_month
FROM properties AS p
WHERE p.TransactionType = '2'
AND MONTH(p.OrigListingDate) = MONTH(CURRENT_DATE)
AND YEAR(p.OrigListingDate) = YEAR(CURRENT_DATE)
GROUP BY p.IntegratorSalesAssociateID
)
SELECT
a.IntegratorSalesAssociateID
, a.AgentName
, COALESCE(s.properties_this_month, 0) AS properties_this_month
FROM agents AS a
LEFT JOIN stats s ON a.IntegratorSalesAssociateID = s.IntegratorSalesAssociateID
Given that the DENSE_RANK()
doesn't affect performance, you want to optimize:鉴于DENSE_RANK()
不会影响性能,您需要优化:
SELECT a.IntegratorSalesAssociateID,
a.AgentName,
(SELECT COUNT(*)
FROM properties p
WHERE a.IntegratorSalesAssociateID = p.IntegratorSalesAssociateID AND
p.TransactionType = '2' AND
MONTH(p.OrigListingDate) = MONTH(CURRENT_DATE) AND
YEAR(p.OrigListingDate) = YEAR(CURRENT_DATE)
) AS properties_this_month
FROM agents a;
I would rewrite this as:我会将其重写为:
SELECT a.IntegratorSalesAssociateID,
a.AgentName,
(SELECT COUNT(*)
FROM properties p
WHERE a.IntegratorSalesAssociateID = p.IntegratorSalesAssociateID AND
p.TransactionType = 2 AND
p.OrigListingDate >= CURRENT_DATE - INTERVAL (1 - DAY(CURRENT_DATE) DAY
) AS properties_this_month
FROM agents a;
The two changes are:这两个变化是:
TransactionType
looks like a number. TransactionType
看起来像一个数字。 Assuming it is, I removed the single quotes.假设是,我删除了单引号。 Don't mix data types, Of course, if the column is a string.不要混合数据类型,当然,如果列是字符串。 then use single quotes.然后使用单引号。 Then, for this query you want an index on: properties(IntegratorSalesAssociateID, TransactionType, OrigListingDate)
.然后,对于此查询,您需要一个索引: properties(IntegratorSalesAssociateID, TransactionType, OrigListingDate)
。 Actually, this index might work on the original version of the data.实际上,该索引可能适用于数据的原始版本。
I sincerely doubt that using an explicit aggregation would improve performance.我真诚地怀疑使用显式聚合会提高性能。 GROUP BY
-- although quite powerful -- is often slower than correlated subqueries. GROUP BY
虽然非常强大——通常比相关子查询慢。 And almost always slower (or at least not faster) with the right indexes.并且使用正确的索引几乎总是更慢(或至少不是更快)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.