[英]Hoping to remove an ugly self-join on a CTE
我有一个创建排序字典的查询(排序方式是有一个增量id
来标识键的相对位置) 。
然后,我希望知道,对于每一行,该value
是否在字典中的任何其他行中作为key
存在。 我正在使用CROSS APPLY
的相关查询。 有效地自我加入CTE。
据我了解,这意味着代表字典的CTE必须计算两次?
除了使用表变量(它在函数内) ,有没有人有任何替代建议?
WITH
dictionary([id], [key], [val]) AS
(
SELECT 1, 'a', 'b'
UNION ALL SELECT 2, 'b', 'c'
UNION ALL SELECT 3, 'c', 'a'
UNION ALL SELECT 4, 'x', 'w'
UNION ALL SELECT 5, 'y', 'x'
UNION ALL SELECT 6, 'z', 'y'
)
SELECT
*
FROM
dictionary dict
CROSS APPLY
(
SELECT COUNT(*) FROM dictionary WHERE dictionary.id > dict.id AND dictionary.[key] = dict.[val]
)
lookup(hits)
CROSS APPLY
(
SELECT 1, 3 WHERE lookup.hits = 0
UNION ALL
SELECT 1, 2 WHERE lookup.hits > 0
UNION ALL
SELECT 2, 3 WHERE lookup.hits > 0
)
map([from], [to])
-- [key]s 'c', 'x', 'y' and 'z' should only have one output rows
-- It's "acceptable" for only 'z' to have just one output row IFF a self join can be avoided
我能想到的其他选择是自我加入的所有变种......
dictionary dict
LEFT JOIN
(
SELECT key, MAX(id) AS id FROM dictionary GROUP BY key
)
lookup
ON lookup.key = dict.value
AND lookup.id > dict.id
要么...
dictionary dict
OUTER APPLY
(
SELECT 1 WHERE EXISTS (SELECT * FROM dictionary WHERE dictionary.id > dict.id AND dictionary.key = dict.value)
)
lookup(hits)
但是,我试图避免CTE的自联接,可能还有我没想过的窗口函数? 任何只是为了避免CTE被计算两次......
(忽略lookup.id > dict.id
方面很好,如果这意味着避免自lookup.id > dict.id
......)
编辑:更完整的例子,以及一个SQL小提琴,感谢@MartinSmith指出一些不一致...
http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/17407
这是使用窗口函数的一种方法。
首先将行取消,以便键和值成为通用terms
然后使用MAX ... OVER (PARTITION BY term)
来查找该术语用作键的最高行的id。
在此示例中,它然后设置一个标志并丢弃由unpivoting添加的重复行(保留该对中的context = 'v'
行,因为这是具有该标志所需信息的那一行)。
然后,您可以使用它来连接包含map
值的表值构造函数。
WITH dictionary(id, [key], value)
AS (
SELECT 1, 'a', 'b'
UNION ALL SELECT 2, 'b', 'c'
UNION ALL SELECT 3, 'c', 'a'
UNION ALL SELECT 4, 'x', 'w'
UNION ALL SELECT 5, 'y', 'x'
UNION ALL SELECT 6, 'z', 'y'
),
t1
AS (SELECT dict.*,
context,
highest_id_where_term_is_key = MAX(CASE
WHEN context = 'k'
THEN v.id
END) OVER (PARTITION BY term)
FROM dictionary dict
CROSS APPLY (VALUES(id, [key], 'k'),
(id, value, 'v')) v(id, term, context)),
t2
AS (SELECT *,
val_in_later_key = CASE
WHEN id < highest_id_where_term_is_key
THEN 1
ELSE 0
END
FROM t1
WHERE context = 'v'
-- Discard duplicate row from the unpivot - only want the "value" row
)
SELECT id,
[key],
value,
highest_id_where_term_is_key,
map.[from],
map.[to]
FROM t2
JOIN (VALUES (1, 3, 0),
(1, 2, 1),
(2, 3, 1) ) map([from], [to], [flg])
ON map.flg = t2.val_in_later_key
ORDER BY id
返回
+----+-----+-------+------------------------------+------+----+
| id | key | value | highest_id_where_term_is_key | from | to |
+----+-----+-------+------------------------------+------+----+
| 1 | a | b | 2 | 1 | 2 |
| 1 | a | b | 2 | 2 | 3 |
| 2 | b | c | 3 | 1 | 2 |
| 2 | b | c | 3 | 2 | 3 |
| 3 | c | a | 1 | 1 | 3 |
| 4 | x | w | NULL | 1 | 3 |
| 5 | y | x | 4 | 1 | 3 |
| 6 | z | y | 5 | 1 | 3 |
+----+-----+-------+------------------------------+------+----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.