Query to select rows with minimum distinct value of a column

Question

I need to select row with minimum value of column B for each row of column A but it should be distinct from the other values that so far have been selected for column A. So the order of A maters. Also if the B is used up and none is left then the later values for A should be NULL or not appearing in the result.

Both A and B are numerical (or time stamp). example:

A   | B | 
----+---+
1   | 3 | 
1   | 5 | 
1   | 6 | 
2   | 3 | 
2   | 5 | 
9   | 3 |
9   | 5 |

So the desired result is:

A   | B | 
----+---+
1   | 3 | 
2   | 5 |

select A, min(B) group by A obviously doesn't work because I don't want B to be repeated. Distinct also doesn't work because the rows are already distinct. I couldn't really find any question similar to this anywhere. The actual data I am working with is the database of timeseries on redshift so A and B are timestamps. CTE's would be specifically welcome.

Answer 1

First I thought this could be solved with ROW_NUMBER () OVER (ORDER PARTITION BY B DESC) however there is a problem, the numbers in B should not be repeated.

At the moment the only thing that comes to mind is to make temporary tables, I know this is not the best way, but you can probably improve it

DECLARE @Tabla1 TABLE(A INT) 
DECLARE @Tabla2 TABLE(B INT)
DECLARE @Tabla3 TABLE(A INT, B INT)
INSERT INTO @Tabla1 SELECT DISTINCT A FROM PRUEBA

WHILE (SELECT COUNT(*) FROM @Tabla1) > 0
BEGIN
  DECLARE @A INT, @B INT;
  SET @A = (SELECT TOP 1  * FROM @Tabla1);
  SET @B = (SELECT MIN(B) FROM PRUEBA WHERE A = @A AND B NOT IN(SELECT * FROM @Tabla2));
  INSERT INTO @Tabla2 VALUES (@B)
  DELETE FROM @Tabla1 WHERE A = @A
  INSERT INTO @Tabla3 SELECT A, B FROM PRUEBA WHERE A = @A AND B = @B
END

SELECT * FROM @Tabla3

Maybe you can use a cursor, but you would have to be calculated that takes more computational expense, the cursor or the temporary tables

Answer 2

This is basically a "find the diagonal" problem. You need to know the rank of B within A and the rank of A within all. I believe this works for the data given:

select A, B from (
  select row_number() over (partition by A order by B) as RN,
    dense_rank() over (order by A) as DR.
    A, B
    from <table> )
where RN = DR;

For more complex cases this solution will get more complex.

Addendum: Because I know it will be asked and this is an interesting problem, I worked out what such a more complex solution would look like:

select min(A) as A, B from (
  select decode(A <> nvl(min(A) over (order by DRB, DRA rows between unbounded preceding and 1 preceding),-1), true, 'good', 'no good') as Y,
    A, B from (
    select dense_rank() over (partition by B order by A) as DRA,
      dense_rank() over ( order by B) as DRB,
      A, B from <table>
  )
  where DRA <= DRB
)
where Y = 'good'
group by B
order by A, B;

Query to select rows with minimum distinct value of a column

Question

2 answers

solution1
0 2020-06-19 08:30:47

solution2
0 2020-06-22 18:14:00

Query to select rows with minimum distinct value of a column

Question

2 answers

solution1 0 2020-06-19 08:30:47

solution2 0 2020-06-22 18:14:00

solution1
0 2020-06-19 08:30:47

solution2
0 2020-06-22 18:14:00