简体   繁体   中英

Compare the data in two tables with same schema

I have been doing a bit of searching for a while now on a particular problem, but I can't quite find this particular question

I have a rather unusual task to achieve in SQL:

I have two tables, say A and B, which have exactly the same column names, of the following form:

id | column_1 | ... | column_n

Both tables have the same number of rows, with the same id's, but for a given id there is a chance that the rows from tables A and B differ in one or more of the other columns.

I already have a query which returns all rows from table A for which the corresponding row in table B is not identical, but what I need is a query which returns something of the form:

id | differing_column
----------------------
1  | column_1
3  | column_6

meaning that the row with id '1' has different 'column_1' values in tables A and B, and the row with id '3' has different 'column_6' values in tables A and B.

Is this at all achievable? I imagine it might require some sort of pivot in order to get the column names as values, but I might be wrong. Any help/suggestions much appreciated.

You can do this with an unpivot -- assuming that the values in the columns are of the same type.

If your data is not too big, I would just recommend using a bunch of union all statements instead:

select a.id, 'Col1' as column
from a join b on a.id = b.id
where a.col1 <> b.col1 or a.col1 is null and b.col1 is not null or a.col1 is not null and b.col1 is null
union all
select a.id, 'Col2' as column
from a join b on a.id = b.id
where a.col2 <> b.col2 or a.col2 is null and b.col2 is not null or a.col2 is not null and b.col2 is null
. . .

This prevents issues with potential type conversion problems.

If you don't mind having the results on one row, you can do:

select a.id,
       (case when a.col1 <> b.col1 or a.col1 is null and b.col1 is not null or a.col1 is not null and b.col1 is null
             then 'Col1;'
             else ''
         end) +
       (case when a.col2 <> b.col2 or a.col2 is null and b.col2 is not null or a.col2 is not null and b.col2 is null
             then 'Col2;'
             else ''
         end) +
       . . .
from a join b on a.id = b.id;

Yes you can do that with a query like this:

WITH Diffs (Id, Col) AS (
    SELECT
        a.Id,
        CASE
            WHEN a.Col1 <> b.Col1 THEN 'Col1'
            WHEN a.Col2 <> b.Col2 THEN 'Col2'
            -- ...and so on
            ELSE NULL
        END as Col
    FROM TableOne a
    JOIN TableTwo b ON a.Id=b.Id
)
SELECT Id, Col
WHERE Col IS NOT NULL

Note that the above query is not going to return all the columns with differences, but only the first one that it is going to find.

If your columns are of the same type, there is a slick method:

SELECT id,col
FROM (SELECT * FROM A UNION ALL SELECT * FROM B) t1
UNPIVOT (value for col in (column_1,column_2,column_3,column_4)) t2
GROUP BY id,col
HAVING COUNT(DISTINCT value) > 1

If you need to handle NULL as a unique value, then use HAVING COUNT(DISTINCT ISNULL(value,X)) > 1 with X being a value that doesn't occur in your data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM