简体   繁体   中英

How can I use PostgreSQL's DISTINCT ON clause to also return a count of the duplicates?

Suppose I have a table like this

+--------+--------+------+--------+---------+
|   A    |   B    |  C   |   g    |    h    |
+--------+--------+------+--------+---------+
| cat    | dog    | bird | 34.223 |  54.223 |
| cat    | pigeon | goat |  23.23 |  54.948 |
| cat    | dog    | bird | 17.386 |  26.398 |
| gopher | pigeon | bird | 23.552 |  89.223 |
+--------+--------+------+--------+---------+

but with many more fields to the right (i, j, k, ...).

I need a resulting table that looks like:

+-----+--------+------+-----+-----+-----+-----+-------+
|  A  |   B    |  C   |  g  |  h  | ... |  z  | count |
+-----+--------+------+-----+-----+-----+-----+-------+
| cat | dog    | bird | xxx | xxx |     | xxx |    23 |
| cat | pigeon | goat | xxx | xxx |     | xxx |    78 |
+-----+--------+------+-----+-----+-----+-----+-------+

I would normally use a GROUP BY, but I don't want to have to repeat all of the column names (g, h, i, ... z).

I can currently get the result I want using a window function combined with DISTINCT ON, but the query is very slow to run (500k+ records), and has a lot of duplication

WITH temp AS (
    SELECT a, b, c, COUNT(*)
    FROM my_table
    GROUP BY a, b, C
)
SELECT DISTINCT ON (a, b, c) *, (
    SELECT count
    FROM temp
    WHERE 
        temp.a = t.a 
        AND temp.b = t.b 
        AND temp.c = t.c
) as count
FROM my_table as t
ORDER BY a, b, c, x, y;

Is there a way to somehow get the count of the rows that were elimated with DISTINCT in a more efficient manner? Something like

SELECT DISTINCT ON (a, b, c)
    *, COUNT(*)
FROM my_table
ORDER BY a, b, c, count;

Or am I taking the wrong approach to begin with?

Use COUNT() with PARTITION BY :

SELECT DISTINCT ON (a, b, c) *, COUNT(*) OVER (PARTITION BY a, b, c)
FROM my_table

You should probably also add an ORDER to your query if you care at all about the rest of the fields, otherwise the rows used to get the data displayed in those fields may be inconsistent.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM