简体   繁体   English

在表列中查找每组最频繁的值

[英]Find the most frequent value per group in a table column

I need to find most frequent value of object_of_search for each ethnicity.我需要为每个种族找到object_of_search最常见值。 How can I achieve this?我怎样才能做到这一点? Subqueries in the SELECT clause and correlated subqueries are not allowed. SELECT子句中的子查询和相关子查询是不允许的。 Something similar to this:类似的东西:

mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"

But this does not aggregate and gives me many rows for each ethnicity and object_of_search:但这并没有聚合,并且为每个种族和 object_of_search 提供了许多行:

 officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
 ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
 ethnicity3                |                 2 |              100 | Fireworks
 ethnicity1                |                 5 |               60 | Firearms
 ethnicity3                |                 2 |              100 | Firearms
 ethnicity1                |                 5 |               60 | Cat
 ethnicity1                |                 5 |               60 | Dog
 ethnicity2                |                 3 | 66.6666666666667 | Firearms
 ethnicity1                |                 5 |               60 | Psychoactive substances
 ethnicity1                |                 5 |               60 | Fireworks

And should be something like this:应该是这样的:

 officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
 ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
 ethnicity3                |                 2 |              100 | Fireworks
 ethnicity1                |                 5 |               60 | Firearms

Table on fiddle . 小提琴表。
Query:询问:

SELECT DISTINCT
    stopAndSearches.officer_defined_ethnicity,
    count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
    sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
       OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
       count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
    mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;

Table:桌子:

CREATE TABLE IF NOT EXISTS stopAndSearches(
    "sas_id" bigserial PRIMARY KEY,
    "officer_defined_ethnicity" VARCHAR(255),
    "object_of_search" VARCHAR(255),
    "outcome" VARCHAR(255)
);

Updated: Fiddle更新: 小提琴

This should address the specific "which object per ethnicity" question.这应该解决具体的“每个种族哪个对象”的问题。

Note, this doesn't address ties in the count.请注意,这并没有解决计数中的关系。 That wasn't part of the question / request.那不是问题/请求的一部分。

Adjust your SQL to include this logic, to provide that detail:调整您的 SQL 以包含此逻辑,以提供该详细信息:

WITH cte AS (
        SELECT officer_defined_ethnicity
             , object_of_search
             , COUNT(*) AS n
             , ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
          FROM stopAndSearches
         GROUP BY officer_defined_ethnicity, object_of_search
     )
SELECT * FROM cte
 WHERE rn = 1
;

Result:结果:

officer_defined_ethnicity官员_定义的种族 object_of_search搜索对象 n n rn
ethnicity1种族1 Cat 1 1 1 1
ethnicity2种族2 Stolen goods被盗物品 2 2 1 1
ethnicity3种族3 Fireworks烟花 1 1 1 1
SELECT DISTINCT ON (1)
       officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY 1, 2
ORDER  BY 1, 3 DESC, 2;

Or more explicitly:或更明确地说:

SELECT DISTINCT ON (officer_defined_ethnicity)
       officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY officer_defined_ethnicity, object_of_search
ORDER  BY officer_defined_ethnicity, ct DESC, object_of_search;
 officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
 ethnicity1                | Cat              | 1
 ethnicity2                | Stolen goods     | 2
 ethnicity3                | Firearms         | 1

db<>fiddle here db<> 在这里摆弄

Since DISTINCT ON is applied after GROUP BY we only need a single query level.由于DISTINCT ONGROUP BY之后应用我们只需要一个查询级别。

  1. Aggregate to get counts per (officer_defined_ethnicity, object_of_search) with GROUP BY .使用GROUP BY聚合以获取每个(officer_defined_ethnicity, object_of_search)计数。
  2. Pick the row with the highest count per officer_defined_ethnicity with DISTINCT ON .使用DISTINCT ON每个officer_defined_ethnicity计数最高的行。

I added object_of_search as third ORDER BY item to act as tiebreaker and produce a deterministic result:我添加了object_of_search作为第三个ORDER BY项目以充当决胜局并产生确定性结果:
In case of ties, pick the earliest object_of_search according to alphabetical sort order.object_of_search情况下,根据字母排序顺序选择最早的object_of_search
Adapt to your needs.适应您的需求。

See:看:

Simpler and typically faster than a subquery with row_number() :比使用row_number()的子查询更简单且通常更快:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM