简体   繁体   中英

Finding all two column duplicates in SQL table where later value of third column is greater than earlier value

Fairly new to SQL and it would be great to have a slicker take on this problem.

Here's an example table:

col1 col2 col3 date
1 1 2 2011-01-22
1 1 4 2011-01-27
3 3 2 2011-01-20
3 8 4 2011-01-12
3 8 2 2011-01-30
4 1 3 2011-01-09
5 3 3 2011-01-27
5 4 2 2011-01-22

and I'd like to return all distinct col1, col2 values where

  • The col1 and col2 values are the same (duplicated) and
  • The latest col3 value is greater than the earliest one

In this case, the result should be

col1 col 2
1 1

I can do

FROM table
GROUP BY col1, col2
HAVING count(*) > 1

to get the duplicate col1, col2 tuples but from there my solution just gets ugly, using multiple nested subqueries and sometimes repeating the same subquery.

What's a clean way to do this? It's in SQLite.

You can do it with FIRST_VALUE() window function:

SELECT col1, col2
FROM (
  SELECT DISTINCT col1, col2,
         FIRST_VALUE(col3) OVER (PARTITION BY col1, col2 ORDER BY date) earlier_col3,
         FIRST_VALUE(col3) OVER (PARTITION BY col1, col2 ORDER BY date DESC) latest_col3
  FROM tablename
)
WHERE latest_col3 > earlier_col3

See the demo .
Results:

> col1 | col2
> ---: | ---:
>    1 |    1

I would use ROW_NUMBER here with pivoting logic:

WITH cte AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY col1, col2
                           ORDER BY date) rn
    FROM yourTable
)

SELECT col1, col2
FROM cte
GROUP BY col1, col2
HAVING MAX(CASE WHEN rn = 2 THEN col3 END) >
       MAX(CASE WHEN rn = 1 THEN col3 END);

Demo

This approach is very close to what you are currently doing. The only addition is a HAVING clause which compares the more recent col3 value against the older one. Note that this answer assumes that your "duplicates" would only be occurring as pairs, and not triplets, etc.

Edit:

If, for some reason, you really can't use window functions, eg because you are using a really old version of SQLite, we can still use roughly the above approach with a join:

SELECT t1.col1, t1.col2
FROM yourTable t1
INNER JOIN
(
    SELECT col1, col2, MIN(date) AS min_date, MAX(date) AS max_date
    FROM yourTable
    GROUP BY col1, col2
) t2
    ON t2.col1 = t1.col1 AND t2.col2 = t1.col2
GROUP BY t1.col1, t1.col2
HAVING
    MAX(CASE WHEN t1.date = t2.max_date THEN col3 END) >
    MAX(CASE WHEN t1.date = t2.min_date THEN col3 END);

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM