简体   繁体   English

SQL - 忽略/删除基于两列的重复项

[英]SQL - Ignoring / Removing Duplicates based on Two Columns

Suppose I have a single table actions like so:假设我有一个像这样的表actions

+----+------------+--------+-------+
| id |    rdate   | clicks | epoch |
+----+------------+--------+-------+
|  1 | 2020-01-01 |    100 |  1200 |
|  1 | 2020-01-01 |     95 |  1100 |
|  1 | 2020-10-12 |     42 |  1000 |
|  1 | 2020-10-12 |     66 |   900 |
+----+------------+--------+-------+

I am trying to write a query that gives me the number of clicks based on the MAX(epoch) for records with identical values in the id and rdate columns - the end result should be: (note, I do not need the epoch column in the result)我正在尝试编写一个查询,该查询根据 MAX MAX(epoch)idrdate列中具有相同值的记录提供点击次数 - 最终结果应该是:(注意,我不需要epoch列结果)

+----+------------+--------+
| id |    date    | clicks |
+----+------------+--------+
|  1 | 2020-01-01 |    100 |
|  1 | 2020-10-12 |     42 |
+----+------------+--------+

I have tried the following query but the duplicates are still present in the result.我尝试了以下查询,但结果中仍然存在重复项。 The group by query does remove duplicates when run by itself, but the inner join to get the clicks does not work as intended. group by查询在自行运行时确实会删除重复项,但用于获取clicks的内部联接无法按预期工作。

SELECT
    id,
    rdate,
    clicks
FROM actions a
INNER JOIN (
    SELECT 
        id,
        rdate, 
        MAX(epoch)
    from actions  
    group by 
        id,
        rdate
) b 
on a.id = b.id
and a.rdate = b.rdate;

You could use QUALIFY and windowed function:您可以使用QUALIFY和窗口 function:

SELECT *
FROM actions
QUALIFY ROW_NUMBER() OVER(PARTITION BY id, rdate ORDER BY epoch DESC) = 1

I suggest using ROW_NUMBER here:我建议在这里使用ROW_NUMBER

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY id, rdate ORDER BY epoch DESC) rn
    FROM actions
)

SELECT id, rdate, clicks, epoch
FROM cte
WHERE rn = 1;

If you want to stick with your current join approach, then you need to fix the logic such that the join to the subquery also restricts the number of clicks:如果您想坚持当前的连接方法,那么您需要修复逻辑,以便与子查询的连接也限制点击次数:

SELECT a1.*
FROM actions a1
INNER JOIN
(
    SELECT id, rdate, MAX(epoch) AS max_epoch
    FROM actions
    GROUP BY id, rdate
) a2
    ON a2.id = a1.id AND a2.rdate = a1.rdate AND a2.max_epoch = a1.epoch;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM