繁体   English   中英

三个表之间的数据差异/比较-SQL Server 2014

[英]Data discrepancies/comparison between three tables - SQL Server 2014

我在SQL Server 2014中有三个表,每个表都有数百万个数据,并且还在不断增长。 我正在尝试查找表之间的差异,例如:

DECLARE @ab TABLE
(
    k1 int,
    k2 int,
    val char(1)
)

DECLARE @cd TABLE
(
    k1 int,
    k2 int,
    val char(1),
    add_cd varchar(50)
)

DECLARE @ef TABLE
(
    k1 int,
    k2 int,
    val char(1),
    add_ef varchar(50)
)

INSERT INTO @ab VALUES(1,1,'a'), (2, 2, 'c'), (3, 3, 'c'), (4, 4, 'd'), (5, 5, NULL), (7, 7, 'g')

INSERT INTO @cd VALUES(1,1,'a', 'DSFS'), (2, 2, 'b', 'ASDF'), (4, 4, NULL, 'SDFE')

INSERT INTO @ef VALUES(1,1,'a', 'SD1245'), (2, 2, 'b', 'EW3464'), (3, 3, 'd', 'DF3452'),(4, 4, 'd', 'FG4576'), (6, 6, 'e', 'RT3453')

这三个集合的公共Key列均为k1和k2,我只想拉出差异,要么“ val”的值应该不同,要么所有三个集合中都不应该存在键组合。 无需比较最终结果中所需的其他列(add_cd和add_ef)。 理想的结果是:

k1   K2   val   k1    k2    val  add_cd  k1   k2    val   add_ef
2    2    c     2     2     b    ASDF    2    2     b     EW3464 
3    3    c     NULL  NULL  NULL NULL    3    3     d     DF3452
4    4    d     4     4     NULL SDFE    4    4     d     FG4576
5    5    NULL  NULL  NULL  NULL NULL    NULL NULL  NULL  NULL
NULL NULL NULL  NULL  NULL  NULL NULL    6    6     e     RT3453
7    7    g     NULL  NULL  NULL NULL    NULL NULL  NULL  NULL

我尝试了以下查询,它给出了所需的结果,但仅适用于数千个而不适用于数百万个。 为键列创建了索引,但是我看到它使用表扫描。 有人可以建议吗?

SELECT a.*, c.*, e.*
FROM @ab a
FULL OUTER JOIN @cd c   ON  a.k1 = c.k1
                        AND a.k2 = c.k2
FULL OUTER JOIN @ef e   ON  (c.k1 = e.k1
                        AND c.k2 = e.k2 ) 
                        OR (a.k1 = e.k1
                        AND a.k2 = e.k2 )       
WHERE   (a.k1 IS NULL OR c.k1 IS NULL OR e.k1 IS NULL)
OR      (ISNULL(a.val, '') != ISNULL(c.val, ''))
OR      (ISNULL(c.val, '') != ISNULL(e.val, ''))
OR      (ISNULL(a.val, '') != ISNULL(e.val, ''))

您现有的查询是正确的方法。 您可以进行一些小的更改以改进它。 每个表的索引应该在k1k2val

编辑(我最初的NULL处理不正确。正确的方法似乎long之以鼻,但可能是在逻辑上正确的最有效的解决方案):

SELECT a.*, c.*, e.*
FROM @ab a
FULL OUTER JOIN @cd c   ON  a.k1 = c.k1
                        AND a.k2 = c.k2
FULL OUTER JOIN @ef e   ON  (c.k1 = e.k1
                        AND c.k2 = e.k2 ) 
                        --OR (a.k1 = e.k1    --This condition is not needed and will only slow performance
                        --AND a.k2 = e.k2 )       
WHERE   (a.k1 IS NULL OR c.k1 IS NULL OR e.k1 IS NULL)
--OR      (ISNULL(a.val, '') != ISNULL(c.val, ''))     --Wrapping the val columns in ISNULL prevents the indexes from being used
--OR      (ISNULL(c.val, '') != ISNULL(e.val, ''))
--OR      (ISNULL(a.val, '') != ISNULL(e.val, ''))
OR      ((a.val != c.val) OR (a.val IS NULL AND c.val IS NOT NULL) OR (a.val IS NOT NULL AND c.val IS NULL))
OR      ((a.val != e.val) OR (a.val IS NULL AND e.val IS NOT NULL) OR (a.val IS NOT NULL AND e.val IS NULL))
OR      ((e.val != c.val) OR (e.val IS NULL AND c.val IS NOT NULL) OR (e.val IS NOT NULL AND c.val IS NULL))

当您需要比较可为空的列时,比较ISNULL()结果可能会更优雅,但是内联函数会阻止查询引擎使用索引,从而迫使表扫描,这对于性能而言是最糟糕的事情。

这样的事情对您有用吗?

SELECT Z.k1, Z.k2, Z.val, Y.k1, Y.k2, Y.val, Y.add_cd, X.k1, X.k2, X.val, X.add_ef
FROM @ab AS Z 
FULL OUTER JOIN @cd AS Y ON Z.k1 = Y.k1 AND Z.k2 = Y.k2
FULL OUTER JOIN @ef AS X ON X.k1 = Y.k1 AND X.k2 = Y.k2
WHERE NOT EXISTS (
    SELECT A.k1, A.k2, A.val, C.k1, C.k2, C.val, C.add_cd, E.k1, E.k2, E.val, E.add_ef
    FROM @ab AS A
    INNER JOIN @cd AS C ON A.k1 = C.k1 AND A.k2 = C.k2 AND A.val = C.val
    INNER JOIN @ef AS E ON C.k1 = E.k1 AND C.k2 = E.k2 AND C.val = E.val
    WHERE Z.k1 = A.k1 AND Z.k2 = A.k2 AND Y.k1 = C.k1 AND Y.k2 = C.k2 AND X.k1 = E.k1 AND X.k2 = E.k2
)

我担心您的NULL可能会有细微差别,而您希望它们的比较方式有无差别...

我认为您正在使用full outer join走上正确的道路,只需要使where子句对ya起作用即可。 可能不是最有效的答案,但可以解决问题。

select *
from @ab as ab
full outer join @cd as cd on ab.k1 = cd.k1
                         and ab.k2 = cd.k2
full outer join @ef as ef on ab.k1 = ef.k1
                         and ab.k2 = ef.k2
where (
        isnull(ab.val, 'X') <> isnull(cd.val, 'XX')
        or
        isnull(ab.val, 'X') <> isnull(ef.val, 'XX')
        or
        isnull(cd.val, 'X') <> isnull(ef.val, 'XX')
        or
        coalesce(ab.val, cd.val, ef.val) is NULL
    )
order by coalesce(ab.k1, cd.k1, ef.k1)
, coalesce(ab.k2, cd.k2, ef.k2)

括号是整个where子句的组成部分,以防万一您添加了另一个约束(不希望编译器造成混淆and / or由于语法的原因)。 order by子句仅用于帮助匹配问题中显示的预期输出的顺序。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM