简体   繁体   中英

SQL - Select duplicates based on two columns in DB2

I am using DB2 and am trying to count duplicate rows in a table called ML_MEASURE. What I define as a duplicate in this table, is a row containing the same DATETIME and TAG_NAME value. So I tried this below:

SELECT 
    DATETIME, 
    TAG_NAME, 
    COUNT(*) AS DUPLICATES
FROM 
    ML_MEASURE 
GROUP BY DATETIME, TAG_NAME 
HAVING COUNT(*) > 1

The query doesn't fail, but I get an empty result, even though I now for a fact I have at least one duplicate, when I tried this query below I got the result correct for this specific tag_name and datetime:

SELECT
    DATETIME,
    TAG_NAME,
    COUNT(*) AS DUPLICATES
FROM
    ML_MEASURE
WHERE
    DATETIME='2018-03-23 15:09:30' AND
    TAG_NAME='HOG.613KU201'
GROUP BY
    DATETIME,
    TAG_NAME.

The result of the second query looked like this:

 DATETIME               TAG_NAME        DUPLICATES
 ---------------------  ------------    ----------
 2018-03-23 15:09:30.0  HOG.613KU201             3

What am I doing wrong in the first query?

* UPDATE *

My table is row organized, not sure if that makes any difference.

Yes, you should get the same row back on the first query. If you had a NOT ENFORCED TRUSTED Primary Key or Unique constraint on those two columns, then the Optimizer would be within it's rights to trust the constraint and return you no rows. However from a quick test, I don't believe it does that for this query. Do you have any indexes defined on the table?

(PS I assume you are not running the query from a shell prompt and redirecting the output to a file of the name 1 )

This worked for me:

SELECT * FROM (
    SELECT DATETIME, TAG_NAME, COUNT(*) AS DUPLICATES
    FROM ML_MEASURE 
    GROUP BY DATETIME, TAG_NAME 
) WHERE DUPLICATES > 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM