SQL: Removing Duplicates rows while retaining the row with highest value in another column

Question

Suppose i have a table Test with Data:

SOID SO_Name   SO_Desc     PRIORITY  ADE_PRIORITIZED  DEPLOY_DATE  ENV
123  SO1      SO1 Desc1      111      Y               01-JAN-01     0
123  SO1      SO1 Desc1      111      Y               01-JAN-01     1
123  SO1      SO1 Desc1      111      Y               01-JAN-01     2
123  SO1      SO1 Desc1      111      Y               01-JAN-01     3
987  SO1      SO1 Desc1      111      Y               01-JAN-01     0
987  SO1      SO1 Desc1      111      Y               01-JAN-16     1
987  SO1      SO1 Desc1      111      Y               21-JAN-17     2
987  SO1      SO1 Desc1      111      Y               01-JAN-17     3
121  SO121    SO121 Desc121  111      Y               01-JAN-17     0

I want to remove the duplicate rows for each soid (duplicate can be based on the 4 columns: so_name,so_desc,priority, ade_prioritized) retaining the row with highest deploy_date.

I used this query but it doesn't delete any row.

delete from so_test a 
where a.deploy_date < (
  select max(b.deploy_date) from so_test b where a.soid = b.soid
);

0 rows deleted

The end result i expect should be: SOID SO_Name SO_Desc PRIORITY ADE_PRIORITIZED DEPLOY_DATE ENV 123 SO1 SO1 Desc1 111 Y 01-JAN-01 0 987 SO1 SO1 Desc1 111 Y 21-JAN-17 2 987 SO1 SO1 Desc1 111 Y 21-JAN-17 2

What can be the issue? can it be done without CTE?

Answer 1

Using with (common table expression) and row_number() you can both identify and then easily handle duplicates:

When using a ctes, you can only perform one statement after the expression (unless you are chaining ctes or using multiple ctes).

In the following code example you would first check the output by using the select, then if further actions are necessary, comment out the select query and un-comment the delete query.

rextester link: http://rextester.com/UFQQ51693

with cte as (
  select   
      *
    , rn = row_number() over (
            partition by soid 
            order by deploy_date desc
            )
    from [so_test]
)
/* --------------------------------------------------------------
-- This returns all of rows with values that have duplicates
-- along the row number (rn) so you can see which rows 
-- would be affected by the following actions
-------------------------------------------------------------- */
/*
select o.*
  from cte as o
  where exists (
      select 1
        from cte as i
        where cte.soid  = i.soid 
          and i.rn>1
      );
--*/
/* --------------------------------------------------------------
-- Remove duplicates by deleting all of the duplicates
-- where the row number (rn) is greater than 1
-- without deleting the first row of the duplicates.
-------------------------------------------------------------- */
--/*
delete 
  from cte 
  where cte.rn > 1 
--*/

rextester reults after delete:

+------+---------+---------------+----------+-----------------+---------------------+-----+
| soid | so_name |    so_desc    | priority | ade_prioritized |     deploy_date     | env |
+------+---------+---------------+----------+-----------------+---------------------+-----+
|  123 | SO1     | SO1_Desc1     |      111 | Y               | 01.01.2001 00:00:00 |   0 |
|  987 | SO1     | SO1_Desc1     |      111 | Y               | 21.01.2017 00:00:00 |   2 |
|  121 | SO121   | SO121_Desc121 |      111 | Y               | 01.01.2017 00:00:00 |   0 |
+------+---------+---------------+----------+-----------------+---------------------+-----+

Answer 2

Example based on preserving non duplicates into a new table.

create table so_test_nodups 
as
with dups as 
( select soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env,  
        row_number() over ( partition by so_name, so_desc, priority, ade_prioritized order by deploy_date desc ) rn 
  from so_test 
) 
select  soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env 
from dups 
where rn=1

Querying the so_test_nodups table.

select * from so_test_nodups

      SOID SO_NAME    SO_DESC                PRIORITY A DEPLOY_DA        ENV
---------- ---------- -------------------- ---------- - --------- ----------
       123 SO1        SO1 Desc1                   111 Y 01-JAN-01          0
       121 SO121      SO121 Desc121               111 Y 01-JAN-17          0

Adding the results after the edits provided:

      SOID SO_NAME    SO_DESC                PRIORITY A DEPLOY_DA        ENV
---------- ---------- -------------------- ---------- - --------- ----------
       987 SO1        SO1 Desc1                   111 Y 21-JAN-17          2
       121 SO121      SO121 Desc121               111 Y 01-JAN-17          0

SQL: Removing Duplicates rows while retaining the row with highest value in another column

Question

2 answers

solution1
1 2017-01-24 19:10:48

solution2
0 ACCPTED 2017-01-24 19:38:43

SQL: Removing Duplicates rows while retaining the row with highest value in another column

Question

2 answers

solution1 1 2017-01-24 19:10:48

solution2 0 ACCPTED 2017-01-24 19:38:43

solution1
1 2017-01-24 19:10:48

solution2
0 ACCPTED 2017-01-24 19:38:43