[英]SQL - Ignoring / Removing Duplicates based on Two Columns
Suppose I have a single table actions
like so:假设我有一个像这样的表
actions
:
+----+------------+--------+-------+
| id | rdate | clicks | epoch |
+----+------------+--------+-------+
| 1 | 2020-01-01 | 100 | 1200 |
| 1 | 2020-01-01 | 95 | 1100 |
| 1 | 2020-10-12 | 42 | 1000 |
| 1 | 2020-10-12 | 66 | 900 |
+----+------------+--------+-------+
I am trying to write a query that gives me the number of clicks based on the MAX(epoch)
for records with identical values in the id
and rdate
columns - the end result should be: (note, I do not need the epoch
column in the result)我正在尝试编写一个查询,该查询根据 MAX
MAX(epoch)
为id
和rdate
列中具有相同值的记录提供点击次数 - 最终结果应该是:(注意,我不需要epoch
列结果)
+----+------------+--------+
| id | date | clicks |
+----+------------+--------+
| 1 | 2020-01-01 | 100 |
| 1 | 2020-10-12 | 42 |
+----+------------+--------+
I have tried the following query but the duplicates are still present in the result.我尝试了以下查询,但结果中仍然存在重复项。 The
group by
query does remove duplicates when run by itself, but the inner join to get the clicks
does not work as intended. group by
查询在自行运行时确实会删除重复项,但用于获取clicks
的内部联接无法按预期工作。
SELECT
id,
rdate,
clicks
FROM actions a
INNER JOIN (
SELECT
id,
rdate,
MAX(epoch)
from actions
group by
id,
rdate
) b
on a.id = b.id
and a.rdate = b.rdate;
I suggest using ROW_NUMBER
here:我建议在这里使用
ROW_NUMBER
:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id, rdate ORDER BY epoch DESC) rn
FROM actions
)
SELECT id, rdate, clicks, epoch
FROM cte
WHERE rn = 1;
If you want to stick with your current join approach, then you need to fix the logic such that the join to the subquery also restricts the number of clicks:如果您想坚持当前的连接方法,那么您需要修复逻辑,以便与子查询的连接也限制点击次数:
SELECT a1.*
FROM actions a1
INNER JOIN
(
SELECT id, rdate, MAX(epoch) AS max_epoch
FROM actions
GROUP BY id, rdate
) a2
ON a2.id = a1.id AND a2.rdate = a1.rdate AND a2.max_epoch = a1.epoch;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.