SQL：查找重复项的数量，添加的新值以及在同一个表中删除的值（动态）

Question

I'm hoping to complete the goals below using SQL : 我希望使用SQL完成以下目标：

1) Find # of duplicated records 1）找到重复记录的数量
Extract number of repeated values based on a column, which is a "snapshot date", comparing that against previous date 根据列（即“快照日期”）提取重复值的数量，并将其与上一个日期进行比较
2) Find # of records added 2）查找添加的记录数
3) Find # of records removed 3）查找已删除的记录数

See sample tables below: 见下面的样本表：

Current Table 当前表

snapshot_date | unique ID
 2018-08-15        1
 2018-08-15        2
 2018-08-15        3
 2018-08-15        4
 2018-08-15        5

 2018-08-16        1
 2018-08-16        3
 2018-08-16        4
 2018-08-16        6
 2018-08-16        7
 2018-08-16        8
 2018-08-16        9

 2018-08-17        3
 2018-08-17        8
 2018-08-17        10
 2018-08-17        11
 2018-08-17        12
 2018-08-17        13

Desired Table 所需的表

snapshot date | count | # of dupe from previous date | sum of ID added | sum of ID removed
 2018-08-15       5                 N/A                     N/A                  N/A 
 2018-08-16       7                  3                       4                    2
 2018-08-17       6                  2                       4                    5

If anyone knows the script to get to the desired table, I'd be so appreciative! 如果有人知道脚本到达所需的表格，我会非常感激！ Thank ya'll in advance! 提前谢谢你！

Answer 1

If you are using MySQL, which, at least in earlier versions, does not support the analytic functions LEAD and LAG, then one approach would be to do a series of self joins followed by an aggregation to get results you want: 如果你使用MySQL，至少在早期版本中，它不支持分析函数LEAD和LAG，那么一种方法是进行一系列自连接，然后进行聚合以获得所需的结果：

SELECT
    t1.snapshot_date,
    t1.count,
    t1.previous_dupe,
    t1.num_added,
    t2.num_subtracted
FROM
(
    SELECT
        t1.snapshot_date,
        COUNT(*) AS count,
        COUNT(t2.snapshot_date) AS previous_dupe,
        COUNT(CASE WHEN t2.snapshot_date IS NULL THEN 1 END) AS num_added
    FROM yourTable t1
    LEFT JOIN yourTable t2
        ON t1.snapshot_date = DATE_ADD(t2.snapshot_date, INTERVAL 1 DAY) AND
           t1.uniqueID = t2.uniqueID
    GROUP BY t1.snapshot_date
) t1
LEFT JOIN
(
    SELECT
        DATE_ADD(t1.snapshot_date, INTERVAL 1 DAY) AS snapshot_date,
        COUNT(CASE WHEN t2.snapshot_date IS NULL THEN 1 END) AS num_subtracted
    FROM yourTable t1
    LEFT JOIN yourTable t2
        ON t1.snapshot_date = DATE_SUB(t2.snapshot_date, INTERVAL 1 DAY) AND
           t1.uniqueID = t2.uniqueID
    GROUP BY t1.snapshot_date
) t2
    ON t1.snapshot_date = t2.snapshot_date;

Demo 演示

Notes: There is a slight discrepancy between my results and what you expect, partly due to your own math error, and partly due to the way the logic in the query works. 注意：我的结果与您的期望之间存在轻微差异，部分原因是您自己的数学错误，部分原因是查询中的逻辑工作方式。 I report 5 new IDs being added in the earliest record, because conceptually there was no earlier record, and all 5 values are techincally new. 我报告在最早的记录中添加了5个新ID，因为从概念上讲，没有先前的记录，并且所有5个值都是技术新的。

This problem was particularly ugly because we needed to self join twice, in two separate subqueries, in different directions. 这个问题特别难看，因为我们需要在两个独立的子查询中以不同的方向自我连接两次。

Answer 2

this is my take. 这是我的看法。 based on SQL Server 基于SQL Server

SELECT  snapshot_date       = COALESCE(c.snapshot_date, DATEADD(day, 1, p.snapshot_date)),
        [count]             = COUNT(c.snapshot_date),
        dup_from_prev_day   = SUM(CASE WHEN c.snapshot_date is not null 
                                       AND  p.snapshot_date is not null 
                                       THEN 1 END),
        sum_of_id_added     = SUM(CASE WHEN c.snapshot_date is not null 
                                       AND  p.snapshot_date is null 
                                       THEN 1 END),
        sum_of_id_removed   = SUM(CASE WHEN c.snapshot_date is null 
                                       AND  p.snapshot_date is not null 
                                       THEN 1 END)
FROM    yourTable c         -- current
        FULL OUTER JOIN yourTable p -- previous
        ON  c.snapshot_date     = DATEADD(DAY, 1, p.snapshot_date)
        AND c.uniqueID          = p.uniqueID
GROUP BY COALESCE(c.snapshot_date, DATEADD(DAY, 1, p.snapshot_date))
HAVING COUNT(c.snapshot_date) > 0

/* RESULT : 
snapshot_date  count  dup_from_prev_day  sum_of_id_added  sum_of_id_removed
2018-08-15     5      NULL               5                NULL
2018-08-16     7      3                  4                2
2018-08-17     6      2                  4                5
*/

SQL：查找重复项的数量，添加的新值以及在同一个表中删除的值（动态）

问题描述

See sample tables below: 见下面的样本表：

2 个解决方案

解决方案1
3 2018-08-16 03:40:30

Demo 演示

解决方案2
3 已采纳 2018-08-16 04:47:31

SQL：查找重复项的数量，添加的新值以及在同一个表中删除的值（动态）

问题描述

See sample tables below: 见下面的样本表：

2 个解决方案

解决方案1 3 2018-08-16 03:40:30

Demo 演示

解决方案2 3 已采纳 2018-08-16 04:47:31

解决方案1
3 2018-08-16 03:40:30

解决方案2
3 已采纳 2018-08-16 04:47:31