簡體   English   中英

如何在PostgreSQL查詢中排序不同的元組

[英]How to order distinct tuples in a PostgreSQL query

我正在嘗試在Postgres中提交一個只返回不同元組的查詢。 在我的示例查詢中,我不希望對於cluster_id / feed_id組合多次存在條目的重復條目。 如果我做一個簡單的事:

select distinct on (cluster_info.cluster_id, feed_id) 
   cluster_info.cluster_id, num_docs, feed_id, url_time 
   from url_info 
   join cluster_info on (cluster_info.cluster_id = url_info.cluster_id) 
   where feed_id in (select pot_seeder from potentials) 
   and num_docs > 5 and url_time > '2012-04-16';

我得到了,但我也想根據num_docs進行num_docs 所以,當我做以下事情時:

select distinct on (cluster_info.cluster_id, feed_id) 
   cluster_info.cluster_id, num_docs, feed_id, url_time 
   from url_info join cluster_info 
   on (cluster_info.cluster_id = url_info.cluster_id) 
   where feed_id in (select pot_seeder from potentials) 
   and num_docs > 5 and url_time > '2012-04-16' 
   order by num_docs desc;

我收到以下錯誤:

ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: select distinct on (cluster_info.cluster_id, feed_id) cluste...

我想我理解為什么我會收到錯誤(除非我以某種方式明確描述該組,否則不能通過元組進行分組)但是我該怎么做? 或者,如果我對錯誤的解釋不正確,有沒有辦法實現我的初始目標?

最左邊的ORDER BY項不能與DISTINCT子句的項不一致。 我引用了關於DISTINCT的手冊

DISTINCT ON表達式必須與最左邊的ORDER BY表達式匹配。 ORDER BY子句通常包含其他表達式,用於確定每個DISTINCT ON組中行的所需優先級。

嘗試:

SELECT *
FROM  (
    SELECT DISTINCT ON (c.cluster_id, feed_id) 
           c.cluster_id, num_docs, feed_id, url_time 
    FROM   url_info u
    JOIN   cluster_info c ON (c.cluster_id = u.cluster_id) 
    WHERE  feed_id IN (SELECT pot_seeder FROM potentials) 
    AND    num_docs > 5
    AND    url_time > '2012-04-16'
    ORDER  BY c.cluster_id, feed_id, num_docs, url_time
           -- first columns match DISTINCT
           -- the rest to pick certain values for dupes
           -- or did you want to pick random values for dupes?
    ) x
ORDER  BY num_docs DESC;

或者使用GROUP BY

SELECT c.cluster_id
     , num_docs
     , feed_id
     , url_time 
FROM   url_info u
JOIN   cluster_info c ON (c.cluster_id = u.cluster_id) 
WHERE  feed_id IN (SELECT pot_seeder FROM potentials) 
AND    num_docs > 5
AND    url_time > '2012-04-16'
GROUP  BY c.cluster_id, feed_id 
ORDER  BY num_docs DESC;

如果c.cluster_id, feed_id是所有(在本例中都是)表中包含SELECT列表中的列的主鍵列,那么這只適用於PostgreSQL 9.1或更高版本。

否則,您需要GROUP BY其余列或聚合或提供更多信息。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM