简体   繁体   English

如何使用PostgreSQL的DISTINCT ON子句还返回重复项的计数?

[英]How can I use PostgreSQL's DISTINCT ON clause to also return a count of the duplicates?

Suppose I have a table like this 假设我有一张这样的桌子

+--------+--------+------+--------+---------+
|   A    |   B    |  C   |   g    |    h    |
+--------+--------+------+--------+---------+
| cat    | dog    | bird | 34.223 |  54.223 |
| cat    | pigeon | goat |  23.23 |  54.948 |
| cat    | dog    | bird | 17.386 |  26.398 |
| gopher | pigeon | bird | 23.552 |  89.223 |
+--------+--------+------+--------+---------+

but with many more fields to the right (i, j, k, ...). 但右边还有更多字段(i,j,k,...)。

I need a resulting table that looks like: 我需要一个结果表,如下所示:

+-----+--------+------+-----+-----+-----+-----+-------+
|  A  |   B    |  C   |  g  |  h  | ... |  z  | count |
+-----+--------+------+-----+-----+-----+-----+-------+
| cat | dog    | bird | xxx | xxx |     | xxx |    23 |
| cat | pigeon | goat | xxx | xxx |     | xxx |    78 |
+-----+--------+------+-----+-----+-----+-----+-------+

I would normally use a GROUP BY, but I don't want to have to repeat all of the column names (g, h, i, ... z). 我通常使用GROUP BY,但是我不想重复所有的列名(g,h,i,... z)。

I can currently get the result I want using a window function combined with DISTINCT ON, but the query is very slow to run (500k+ records), and has a lot of duplication 我目前可以使用结合DISTINCT ON的窗口函数来获得所需的结果,但是查询的运行速度非常慢(超过500k条记录),并且重复项很多

WITH temp AS (
    SELECT a, b, c, COUNT(*)
    FROM my_table
    GROUP BY a, b, C
)
SELECT DISTINCT ON (a, b, c) *, (
    SELECT count
    FROM temp
    WHERE 
        temp.a = t.a 
        AND temp.b = t.b 
        AND temp.c = t.c
) as count
FROM my_table as t
ORDER BY a, b, c, x, y;

Is there a way to somehow get the count of the rows that were elimated with DISTINCT in a more efficient manner? 有没有办法以某种更有效的方式获得用DISTINCT消除的行数? Something like 就像是

SELECT DISTINCT ON (a, b, c)
    *, COUNT(*)
FROM my_table
ORDER BY a, b, c, count;

Or am I taking the wrong approach to begin with? 还是我采用了错误的方法?

Use COUNT() with PARTITION BY : COUNT()PARTITION BY

SELECT DISTINCT ON (a, b, c) *, COUNT(*) OVER (PARTITION BY a, b, c)
FROM my_table

You should probably also add an ORDER to your query if you care at all about the rest of the fields, otherwise the rows used to get the data displayed in those fields may be inconsistent. 如果您根本不关心其余字段,则可能还应该在查询中添加ORDER,否则用于获取这些字段中显示的数据的行可能会不一致。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM