RANK（）在RANK重置的分区上

Question

How can I get a RANK that restarts at partition change? 如何获得在分区更改时重新启动的RANK？ I have this table: 我有这张桌子：

ID    Date        Value  
1     2015-01-01  1  
2     2015-01-02  1 <redundant  
3     2015-01-03  2  
4     2015-01-05  2 <redundant  
5     2015-01-06  1  
6     2015-01-08  1 <redundant  
7     2015-01-09  1 <redundant  
8     2015-01-10  2  
9     2015-01-11  3  
10    2015-01-12  3 <redundant

and I'm trying to delete all the rows where the Value is not changed from the previous entry (marked with < redundant ). 我正在尝试删除所有未从上一个条目更改值的行（标记为<redundant ）。 I've tried using cursors but it takes too long, as the table has ~50 million rows. 我尝试过使用游标，但这需要太长时间，因为该表有大约5000万行。

I've also tried using RANK: 我也尝试过使用RANK：

SELECT ID, Date, Value,
RANK() over(partition by Value order by Date ASC) Rank,
FROM DataLogging 
ORDER BY Date ASC

but I get: 但我得到：

ID    Date        Value  Rank   (Rank)
1     2015-01-01  1      1      (1)
2     2015-01-02  1      2      (2)
3     2015-01-03  2      1      (1)
4     2015-01-05  2      2      (2)
5     2015-01-06  1      3      (1)
6     2015-01-08  1      4      (2)
7     2015-01-09  1      5      (3)
8     2015-01-10  2      3      (1)
9     2015-01-11  3      1      (1)
10    2015-01-12  3      2      (2)

in parantheses is the Rank I would want, so that I can filter out rows with Rank = 1 and delete the rest of the rows. 在parantheses中是我想要的Rank，这样我就可以过滤掉Rank = 1的行并删除其余的行。

EDIT: I've accepted the answer that seemed the easiest to write, but unfortunately none of the answers runs fast enough for deleting the rows. 编辑：我已经接受了似乎最容易编写的答案，但不幸的是，没有一个答案运行得足够快以删除行。 In the end I've decided to use the CURSOR afterall. 最后我决定使用CURSOR毕竟。 I've split the data in chuncks of about 250k rows and the cursor runs through and deletes the rows in ~11 mins per batch of 250k rows, and the answers below, with DELETE, take ~35 mins per batch of 250k rows. 我已经将数据拆分成大约250k行的块，并且光标贯穿并删除每批250k行约11分钟的行，下面的答案（DELETE）每批250k行需要约35分钟。

Answer 1

Here is a somewhat convoluted way to do it: 这是一个有点复杂的方法：

WITH CTE AS
(
    SELECT  *, 
            ROW_NUMBER() OVER(ORDER BY [Date]) RN1,
            ROW_NUMBER() OVER(PARTITION BY Value ORDER BY [Date]) RN2
    FROM dbo.YourTable
), CTE2 AS
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Value, RN1 - RN2 ORDER BY [Date]) N
    FROM CTE
)
SELECT *
FROM CTE2
ORDER BY ID;

The results are: 结果是：

╔════╦════════════╦═══════╦═════╦═════╦═══╗
║ ID ║    Date    ║ Value ║ RN1 ║ RN2 ║ N ║
╠════╬════════════╬═══════╬═════╬═════╬═══╣
║  1 ║ 2015-01-01 ║     1 ║   1 ║   1 ║ 1 ║
║  2 ║ 2015-01-02 ║     1 ║   2 ║   2 ║ 2 ║
║  3 ║ 2015-01-03 ║     2 ║   3 ║   1 ║ 1 ║
║  4 ║ 2015-01-05 ║     2 ║   4 ║   2 ║ 2 ║
║  5 ║ 2015-01-06 ║     1 ║   5 ║   3 ║ 1 ║
║  6 ║ 2015-01-08 ║     1 ║   6 ║   4 ║ 2 ║
║  7 ║ 2015-01-09 ║     1 ║   7 ║   5 ║ 3 ║
║  8 ║ 2015-01-10 ║     2 ║   8 ║   3 ║ 1 ║
║  9 ║ 2015-01-11 ║     3 ║   9 ║   1 ║ 1 ║
║ 10 ║ 2015-01-12 ║     3 ║  10 ║   2 ║ 2 ║
╚════╩════════════╩═══════╩═════╩═════╩═══╝

To delete the rows you don't want, you just need to do: 要删除您不想要的行，您只需要执行以下操作：

DELETE FROM CTE2
WHERE N > 1;

Answer 2

If you want to delete the rows, I would suggest you use lag() : 如果你想删除行，我建议你使用lag() ：

with todelete as (
      select t.*, lag(value) over (order by date) as prev_value
      from t
     )
delete from todelete
    where value = prev_value;

I'm not quite sure what rank() has to do with the problem. 我不太确定rank()与问题有什么关系。

EDIT: 编辑：

To see the rows not deleted with the same logic: 要查看未使用相同逻辑删除的行：

with todelete as (
      select t.*, lag(value) over (order by date) as prev_value
      from t
     )
select *
from todelete
where value <> prev_value or prev_value is null;

The where clause is just the inverse of the where clause in the first query, taking NULL values into account. where子句只是第一个查询中where子句的反转，将NULL值考虑在内。

Answer 3

select * 
from  ( select ID, Date, Value, lag(Value, 1, 0) over (order by ID) as ValueLag 
        from table ) tt
where ValueLag is null or ValueLag <> Value

if the order is Date then over (order by Date) 如果订单是日期然后结束（按日期排序）

this should show you good and bad - it is based on ID - it you need date then revise 这应该告诉你好坏 - 它是基于ID - 你需要约会然后修改
it may look like a long way around but it should be pretty efficient 它可能看起来很长，但应该非常有效

declare @tt table  (id tinyint, val tinyint);
insert into @tt values 
( 1, 1),
( 2, 1),
( 3, 2),
( 4, 2),
( 5, 1),
( 6, 1),
( 7, 1),
( 8, 2),
( 9, 3),
(10, 3);

select id, val, LAG(val) over (order by id) as lagVal
from @tt;

-- find the good
select id, val 
from ( select id, val, LAG(val) over (order by id) as lagVal
       from @tt 
     ) tt
where  lagVal is null or lagVal <> val 

-- select the bad 
select tt.id, tt.val 
  from @tt tt
  left join ( select id, val 
                from ( select id, val, LAG(val) over (order by id) as lagVal
                         from @tt 
                     ) ttt
               where   ttt.lagVal is null or ttt.lagVal <> ttt.val 
            ) tttt 
    on tttt.id = tt.id 
 where tttt.id is null

Answer 4

This is interesting so I'd thought I'd jump in. Unfortunately, creating a solution with RANK() (or rather, ROW_NUMBER() ) without first transforming the data looks to be unobtainable. 这很有趣，所以我想我会跳进去。不幸的是，在没有首先转换数据的情况下使用RANK() （或者更确切地说， ROW_NUMBER() ）创建解决方案看起来是无法获得的。 In an attempt to transform the data, I came up with this solution that uses 1 ROW_NUMBER() : 为了转换数据，我提出了使用1 ROW_NUMBER()解决方案：

;WITH Ordered AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY [Date]) AS [Row], *
    FROM DataLogging
),
Final AS
(
    SELECT
        o1.*, NULLIF(o1.Value - ISNULL(o2.Value, o1.Value - 1), 0) [Change]
    FROM
        Ordered o1
        LEFT JOIN Ordered o2 ON
            o1.[Row] = o2.[Row] + 1
)
SELECT * FROM Final

In the last Change column, the value will be NULL if there is no change in value (but will have the difference if there is a change). 在最后一个“ Change列中，如果值没有变化，则该值将为NULL （但如果存在更改，则将具有差异）。

So to do the delete, change the select to 所以要删除，将选择更改为

DELETE FROM DataLogging where Change IS NULL

Edit: Lag would work here too but I was visualizing the solution as I went along and completely forgot about that. 编辑： Lag也可以在这里工作，但是当我走过去时，我正在想象解决方案并完全忘记了这一点。

Answer 5

Worked for my case! 为我的案子工作！ thanks I had to fetch the report_to change for an employee with respect to the previous reports_to valueand effdt. 谢谢我必须获取一个关于之前的reports_to value和effdt的员工的report_to更改。 In other words, fetcth min effective date row for each reports_to change for an employee. 换句话说，每个reports_to的fetcth min生效日期行为员工更改。

with tocheck as ( select T.emplid,T.reports_to,T.effdt, lag(reports_to) over (order by effdt) as prev_value from PS_JOB t ) select * from tocheck where reports_to <> prev_value or prev_value is null; 使用tocheck as（选择T.emplid，T.reports_to，T.effdt，lag（reports_to）over（order by effdt）作为来自PS_JOB的prev_value t）select * from tocheck其中reports_to <> prev_value或prev_value为null;

added changes further as p 进一步增加了变化p

RANK（）在RANK重置的分区上

问题描述

5 个解决方案

解决方案1
5 2016-02-04 18:12:16

解决方案2
2 2016-02-04 18:42:26

解决方案3
1 已采纳 2016-02-04 18:08:38

解决方案4
0 2016-02-04 20:09:12

解决方案5
0 2018-08-16 11:13:31

RANK（）在RANK重置的分区上

问题描述

5 个解决方案

解决方案1 5 2016-02-04 18:12:16

解决方案2 2 2016-02-04 18:42:26

解决方案3 1 已采纳 2016-02-04 18:08:38

解决方案4 0 2016-02-04 20:09:12

解决方案5 0 2018-08-16 11:13:31

解决方案1
5 2016-02-04 18:12:16

解决方案2
2 2016-02-04 18:42:26

解决方案3
1 已采纳 2016-02-04 18:08:38

解决方案4
0 2016-02-04 20:09:12

解决方案5
0 2018-08-16 11:13:31