繁体   English   中英

使用自联接基于列值从表中删除行

[英]Remove rows from table based on column value using self join

我有一个要求,我的数据如下所示。 我必须从表中找到最后一个pid状态未被“删除”的ID。

注意: - 1.要获取最后一个pid状态,请使用“date”和“hour”列。 2.如果对于“id”,则删除pid的最后“status”值,然后不在结果中包含该行。

id  |   key    |    date     |  hour    | pid  | status
--------------------------------------------------------
id1 |   one    |    20180618 |  2       |  p1  | added
id1 |   one    |    20180618 |  3       |  p1  | removed
id1 |   one    |    20180618 |  4       |  p1  | added
id1 |   one    |    20180618 |  4       |  p2  | added

id1 |   one    |    20180619 |  2       |  p1  | removed
id1 |   one    |    20180619 |  4       |  p1  | added
id1 |   one    |    20180619 |  4       |  p2  | removed
id1 |   one    |    20180619 |  5       |  p3  | added

id2 |   one    |    20180619 |  5       |  p1  | added
id2 |   one    |    20180619 |  5       |  p2  | added
id2 |   one    |    20180619 |  6       |  p1  | removed

预期产量: -

id  |   key    |    date     |  hour    | pid  | status
--------------------------------------------------------
id1 |   one    |    20180619 |  4       |  p1  | added
id1 |   one    |    20180619 |  5       |  p3  | added
id2 |   one    |    20180619 |  5       |  p2  | added

我不想从源表中删除数据。 我想查询源表以使用自联接产生上述结果。

last_value窗口函数可以允许您在没有连接的情况下执行此操作:

SELECT id, key, date, hour, pid, status
FROM   (SELECT id, key, date, hour, pid, status,
               LAST_VALUE(status) OVER (PARTITION BY id ORDER BY data ASC, hour ASC) AS lv
        FROM   mytable) t
WHERE   lv <> 'removed'

使用row_number()函数来识别idpid每个组合的最新记录,然后很容易只选择那些具有你想要状态的记录,如下所示:

declare @SampleData table (id varchar(32), [key] varchar(32), [date] date, [hour] int, pid varchar(32), [status] varchar(32));
insert @SampleData values
    ('id1', 'one', '20180618', 2, 'p1', 'added'),
    ('id1', 'one', '20180618', 3, 'p1', 'removed'),
    ('id1', 'one', '20180618', 4, 'p1', 'added'),
    ('id1', 'one', '20180618', 4, 'p2', 'added'),
    ('id1', 'one', '20180619', 2, 'p1', 'removed'),
    ('id1', 'one', '20180619', 4, 'p1', 'added'),
    ('id1', 'one', '20180619', 4, 'p2', 'removed'),
    ('id1', 'one', '20180619', 5, 'p3', 'added'),
    ('id2', 'one', '20180619', 5, 'p1', 'added'),
    ('id2', 'one', '20180619', 5, 'p2', 'added'),
    ('id2', 'one', '20180619', 6, 'p1', 'removed');

with OrderedDataCTE as
(
    select
        S.id, S.[key], S.[date], S.[hour], S.pid, S.[status],
        [sequence] = row_number() over (partition by S.id, S.pid order by S.[date] desc, S.[hour] desc)
    from
        @SampleData S
)
select
    O.id, O.[key], O.[date], O.[hour], O.pid, O.[status]
from
    OrderedDataCTE O
where
    O.[sequence] = 1 and
    O.[status] != 'removed';

因为你要求一个自我加入的解决方案。

以下是使用自联接的解决方案:

SELECT t.*
FROM YourTable t
LEFT JOIN YourTable r
ON ( r.id = t.id AND r.pid = t.pid AND r.[status] = 'removed'
     AND dateadd(hour,r.hour,cast(r.date AS datetime)) >= dateadd(hour,t.hour,cast(t.date as datetime))
)
WHERE r.[status] IS NULL
ORDER BY t.id, t.pid, t.date, t.hour;

但我会优先选择NOT EXISTS版本

SELECT *
FROM YourTable t
WHERE NOT EXISTS
(
    SELECT 1 
    FROM YourTable r
    WHERE r.id = t.id AND r.pid = t.pid AND r.[status] = 'removed'
     AND dateadd(hour,r.hour,cast(r.date AS datetime)) >= dateadd(hour,t.hour,cast(t.date as datetime))
)
ORDER BY t.id, t.pid, t.date, t.hour;

两者都回归:

id  key date       hour pid status
--- --- ---------- ---- --- ------
id1 one 2018-06-19    4 p1  added
id1 one 2018-06-19    5 p3  added
id2 one 2018-06-19    5 p2  added

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM