[英]Data discrepancies/comparison between three tables - SQL Server 2014
我在SQL Server 2014中有三个表,每个表都有数百万个数据,并且还在不断增长。 我正在尝试查找表之间的差异,例如:
DECLARE @ab TABLE
(
k1 int,
k2 int,
val char(1)
)
DECLARE @cd TABLE
(
k1 int,
k2 int,
val char(1),
add_cd varchar(50)
)
DECLARE @ef TABLE
(
k1 int,
k2 int,
val char(1),
add_ef varchar(50)
)
INSERT INTO @ab VALUES(1,1,'a'), (2, 2, 'c'), (3, 3, 'c'), (4, 4, 'd'), (5, 5, NULL), (7, 7, 'g')
INSERT INTO @cd VALUES(1,1,'a', 'DSFS'), (2, 2, 'b', 'ASDF'), (4, 4, NULL, 'SDFE')
INSERT INTO @ef VALUES(1,1,'a', 'SD1245'), (2, 2, 'b', 'EW3464'), (3, 3, 'd', 'DF3452'),(4, 4, 'd', 'FG4576'), (6, 6, 'e', 'RT3453')
这三个集合的公共Key列均为k1和k2,我只想拉出差异,要么“ val”的值应该不同,要么所有三个集合中都不应该存在键组合。 无需比较最终结果中所需的其他列(add_cd和add_ef)。 理想的结果是:
k1 K2 val k1 k2 val add_cd k1 k2 val add_ef
2 2 c 2 2 b ASDF 2 2 b EW3464
3 3 c NULL NULL NULL NULL 3 3 d DF3452
4 4 d 4 4 NULL SDFE 4 4 d FG4576
5 5 NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL 6 6 e RT3453
7 7 g NULL NULL NULL NULL NULL NULL NULL NULL
我尝试了以下查询,它给出了所需的结果,但仅适用于数千个而不适用于数百万个。 为键列创建了索引,但是我看到它使用表扫描。 有人可以建议吗?
SELECT a.*, c.*, e.*
FROM @ab a
FULL OUTER JOIN @cd c ON a.k1 = c.k1
AND a.k2 = c.k2
FULL OUTER JOIN @ef e ON (c.k1 = e.k1
AND c.k2 = e.k2 )
OR (a.k1 = e.k1
AND a.k2 = e.k2 )
WHERE (a.k1 IS NULL OR c.k1 IS NULL OR e.k1 IS NULL)
OR (ISNULL(a.val, '') != ISNULL(c.val, ''))
OR (ISNULL(c.val, '') != ISNULL(e.val, ''))
OR (ISNULL(a.val, '') != ISNULL(e.val, ''))
您现有的查询是正确的方法。 您可以进行一些小的更改以改进它。 每个表的索引应该在k1
, k2
, val
:
编辑(我最初的NULL处理不正确。正确的方法似乎long之以鼻,但可能是在逻辑上正确的最有效的解决方案):
SELECT a.*, c.*, e.*
FROM @ab a
FULL OUTER JOIN @cd c ON a.k1 = c.k1
AND a.k2 = c.k2
FULL OUTER JOIN @ef e ON (c.k1 = e.k1
AND c.k2 = e.k2 )
--OR (a.k1 = e.k1 --This condition is not needed and will only slow performance
--AND a.k2 = e.k2 )
WHERE (a.k1 IS NULL OR c.k1 IS NULL OR e.k1 IS NULL)
--OR (ISNULL(a.val, '') != ISNULL(c.val, '')) --Wrapping the val columns in ISNULL prevents the indexes from being used
--OR (ISNULL(c.val, '') != ISNULL(e.val, ''))
--OR (ISNULL(a.val, '') != ISNULL(e.val, ''))
OR ((a.val != c.val) OR (a.val IS NULL AND c.val IS NOT NULL) OR (a.val IS NOT NULL AND c.val IS NULL))
OR ((a.val != e.val) OR (a.val IS NULL AND e.val IS NOT NULL) OR (a.val IS NOT NULL AND e.val IS NULL))
OR ((e.val != c.val) OR (e.val IS NULL AND c.val IS NOT NULL) OR (e.val IS NOT NULL AND c.val IS NULL))
当您需要比较可为空的列时,比较ISNULL()结果可能会更优雅,但是内联函数会阻止查询引擎使用索引,从而迫使表扫描,这对于性能而言是最糟糕的事情。
这样的事情对您有用吗?
SELECT Z.k1, Z.k2, Z.val, Y.k1, Y.k2, Y.val, Y.add_cd, X.k1, X.k2, X.val, X.add_ef
FROM @ab AS Z
FULL OUTER JOIN @cd AS Y ON Z.k1 = Y.k1 AND Z.k2 = Y.k2
FULL OUTER JOIN @ef AS X ON X.k1 = Y.k1 AND X.k2 = Y.k2
WHERE NOT EXISTS (
SELECT A.k1, A.k2, A.val, C.k1, C.k2, C.val, C.add_cd, E.k1, E.k2, E.val, E.add_ef
FROM @ab AS A
INNER JOIN @cd AS C ON A.k1 = C.k1 AND A.k2 = C.k2 AND A.val = C.val
INNER JOIN @ef AS E ON C.k1 = E.k1 AND C.k2 = E.k2 AND C.val = E.val
WHERE Z.k1 = A.k1 AND Z.k2 = A.k2 AND Y.k1 = C.k1 AND Y.k2 = C.k2 AND X.k1 = E.k1 AND X.k2 = E.k2
)
我担心您的NULL可能会有细微差别,而您希望它们的比较方式有无差别...
我认为您正在使用full outer join
走上正确的道路,只需要使where子句对ya起作用即可。 可能不是最有效的答案,但可以解决问题。
select *
from @ab as ab
full outer join @cd as cd on ab.k1 = cd.k1
and ab.k2 = cd.k2
full outer join @ef as ef on ab.k1 = ef.k1
and ab.k2 = ef.k2
where (
isnull(ab.val, 'X') <> isnull(cd.val, 'XX')
or
isnull(ab.val, 'X') <> isnull(ef.val, 'XX')
or
isnull(cd.val, 'X') <> isnull(ef.val, 'XX')
or
coalesce(ab.val, cd.val, ef.val) is NULL
)
order by coalesce(ab.k1, cd.k1, ef.k1)
, coalesce(ab.k2, cd.k2, ef.k2)
括号是整个where
子句的组成部分,以防万一您添加了另一个约束(不希望编译器造成混淆and
/ or
由于语法的原因)。 order by
子句仅用于帮助匹配问题中显示的预期输出的顺序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.