SQL: find count of duplicates, new values added, and values removed in the same table (dynamically)

Question

I'm hoping to complete the goals below using SQL :

1) Find # of duplicated records
Extract number of repeated values based on a column, which is a "snapshot date", comparing that against previous date
2) Find # of records added
3) Find # of records removed

See sample tables below:

Current Table

snapshot_date | unique ID
 2018-08-15        1
 2018-08-15        2
 2018-08-15        3
 2018-08-15        4
 2018-08-15        5

 2018-08-16        1
 2018-08-16        3
 2018-08-16        4
 2018-08-16        6
 2018-08-16        7
 2018-08-16        8
 2018-08-16        9

 2018-08-17        3
 2018-08-17        8
 2018-08-17        10
 2018-08-17        11
 2018-08-17        12
 2018-08-17        13

Desired Table

snapshot date | count | # of dupe from previous date | sum of ID added | sum of ID removed
 2018-08-15       5                 N/A                     N/A                  N/A 
 2018-08-16       7                  3                       4                    2
 2018-08-17       6                  2                       4                    5

If anyone knows the script to get to the desired table, I'd be so appreciative! Thank ya'll in advance!

Answer 1

If you are using MySQL, which, at least in earlier versions, does not support the analytic functions LEAD and LAG, then one approach would be to do a series of self joins followed by an aggregation to get results you want:

SELECT
    t1.snapshot_date,
    t1.count,
    t1.previous_dupe,
    t1.num_added,
    t2.num_subtracted
FROM
(
    SELECT
        t1.snapshot_date,
        COUNT(*) AS count,
        COUNT(t2.snapshot_date) AS previous_dupe,
        COUNT(CASE WHEN t2.snapshot_date IS NULL THEN 1 END) AS num_added
    FROM yourTable t1
    LEFT JOIN yourTable t2
        ON t1.snapshot_date = DATE_ADD(t2.snapshot_date, INTERVAL 1 DAY) AND
           t1.uniqueID = t2.uniqueID
    GROUP BY t1.snapshot_date
) t1
LEFT JOIN
(
    SELECT
        DATE_ADD(t1.snapshot_date, INTERVAL 1 DAY) AS snapshot_date,
        COUNT(CASE WHEN t2.snapshot_date IS NULL THEN 1 END) AS num_subtracted
    FROM yourTable t1
    LEFT JOIN yourTable t2
        ON t1.snapshot_date = DATE_SUB(t2.snapshot_date, INTERVAL 1 DAY) AND
           t1.uniqueID = t2.uniqueID
    GROUP BY t1.snapshot_date
) t2
    ON t1.snapshot_date = t2.snapshot_date;

Demo

Notes: There is a slight discrepancy between my results and what you expect, partly due to your own math error, and partly due to the way the logic in the query works. I report 5 new IDs being added in the earliest record, because conceptually there was no earlier record, and all 5 values are techincally new.

This problem was particularly ugly because we needed to self join twice, in two separate subqueries, in different directions.

Answer 2

this is my take. based on SQL Server

SELECT  snapshot_date       = COALESCE(c.snapshot_date, DATEADD(day, 1, p.snapshot_date)),
        [count]             = COUNT(c.snapshot_date),
        dup_from_prev_day   = SUM(CASE WHEN c.snapshot_date is not null 
                                       AND  p.snapshot_date is not null 
                                       THEN 1 END),
        sum_of_id_added     = SUM(CASE WHEN c.snapshot_date is not null 
                                       AND  p.snapshot_date is null 
                                       THEN 1 END),
        sum_of_id_removed   = SUM(CASE WHEN c.snapshot_date is null 
                                       AND  p.snapshot_date is not null 
                                       THEN 1 END)
FROM    yourTable c         -- current
        FULL OUTER JOIN yourTable p -- previous
        ON  c.snapshot_date     = DATEADD(DAY, 1, p.snapshot_date)
        AND c.uniqueID          = p.uniqueID
GROUP BY COALESCE(c.snapshot_date, DATEADD(DAY, 1, p.snapshot_date))
HAVING COUNT(c.snapshot_date) > 0

/* RESULT : 
snapshot_date  count  dup_from_prev_day  sum_of_id_added  sum_of_id_removed
2018-08-15     5      NULL               5                NULL
2018-08-16     7      3                  4                2
2018-08-17     6      2                  4                5
*/

SQL: find count of duplicates, new values added, and values removed in the same table (dynamically)

Question

See sample tables below:

2 answers

solution1
3 2018-08-16 03:40:30

Demo

solution2
3 ACCPTED 2018-08-16 04:47:31

SQL: find count of duplicates, new values added, and values removed in the same table (dynamically)

Question

See sample tables below:

2 answers

solution1 3 2018-08-16 03:40:30

Demo

solution2 3 ACCPTED 2018-08-16 04:47:31

solution1
3 2018-08-16 03:40:30

solution2
3 ACCPTED 2018-08-16 04:47:31