在表列中查找每组最频繁的值

Question

I need to find most frequent value of object_of_search for each ethnicity.我需要为每个种族找到object_of_search最常见值。 How can I achieve this?我怎样才能做到这一点？ Subqueries in the SELECT clause and correlated subqueries are not allowed. SELECT子句中的子查询和相关子查询是不允许的。 Something similar to this:类似的东西：

mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"

But this does not aggregate and gives me many rows for each ethnicity and object_of_search:但这并没有聚合，并且为每个种族和 object_of_search 提供了许多行：

 officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
 ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
 ethnicity3                |                 2 |              100 | Fireworks
 ethnicity1                |                 5 |               60 | Firearms
 ethnicity3                |                 2 |              100 | Firearms
 ethnicity1                |                 5 |               60 | Cat
 ethnicity1                |                 5 |               60 | Dog
 ethnicity2                |                 3 | 66.6666666666667 | Firearms
 ethnicity1                |                 5 |               60 | Psychoactive substances
 ethnicity1                |                 5 |               60 | Fireworks

And should be something like this:应该是这样的：

 officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
 ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
 ethnicity3                |                 2 |              100 | Fireworks
 ethnicity1                |                 5 |               60 | Firearms

Table on fiddle . 小提琴表。
Query:询问：

SELECT DISTINCT
    stopAndSearches.officer_defined_ethnicity,
    count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
    sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
       OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
       count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
    mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;

Table:桌子：

CREATE TABLE IF NOT EXISTS stopAndSearches(
    "sas_id" bigserial PRIMARY KEY,
    "officer_defined_ethnicity" VARCHAR(255),
    "object_of_search" VARCHAR(255),
    "outcome" VARCHAR(255)
);

Answer 1

Updated: Fiddle更新：小提琴

This should address the specific "which object per ethnicity" question.这应该解决具体的“每个种族哪个对象”的问题。

Note, this doesn't address ties in the count.请注意，这并没有解决计数中的关系。 That wasn't part of the question / request.那不是问题/请求的一部分。

Adjust your SQL to include this logic, to provide that detail:调整您的 SQL 以包含此逻辑，以提供该详细信息：

WITH cte AS (
        SELECT officer_defined_ethnicity
             , object_of_search
             , COUNT(*) AS n
             , ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
          FROM stopAndSearches
         GROUP BY officer_defined_ethnicity, object_of_search
     )
SELECT * FROM cte
 WHERE rn = 1
;

Result:结果：

officer_defined_ethnicity官员_定义的种族	object_of_search搜索对象	n n	rn恩
ethnicity1种族1	Cat猫	1 1	1 1
ethnicity2种族2	Stolen goods被盗物品	2 2	1 1
ethnicity3种族3	Fireworks烟花	1 1	1 1

Answer 2

SELECT DISTINCT ON (1)
       officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY 1, 2
ORDER  BY 1, 3 DESC, 2;

Or more explicitly:或更明确地说：

SELECT DISTINCT ON (officer_defined_ethnicity)
       officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY officer_defined_ethnicity, object_of_search
ORDER  BY officer_defined_ethnicity, ct DESC, object_of_search;

 officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
 ethnicity1                | Cat              | 1
 ethnicity2                | Stolen goods     | 2
 ethnicity3                | Firearms         | 1

db<>fiddle here db<> 在这里摆弄

Since DISTINCT ON is applied after GROUP BY we only need a single query level.由于DISTINCT ON在GROUP BY之后应用，我们只需要一个查询级别。

Aggregate to get counts per (officer_defined_ethnicity, object_of_search) with GROUP BY .使用GROUP BY聚合以获取每个(officer_defined_ethnicity, object_of_search)计数。
Pick the row with the highest count per officer_defined_ethnicity with DISTINCT ON .使用DISTINCT ON每个officer_defined_ethnicity计数最高的行。

I added object_of_search as third ORDER BY item to act as tiebreaker and produce a deterministic result:我添加了object_of_search作为第三个ORDER BY项目以充当决胜局并产生确定性结果：
In case of ties, pick the earliest object_of_search according to alphabetical sort order.在object_of_search情况下，根据字母排序顺序选择最早的object_of_search 。
Adapt to your needs.适应您的需求。

See:看：

Simpler and typically faster than a subquery with row_number() :比使用row_number()的子查询更简单且通常更快：

Select first row in each GROUP BY group?选择每个 GROUP BY 组中的第一行？ - Benchmarks - 基准

在表列中查找每组最频繁的值

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-10-23 14:53:17

解决方案2
0 2021-10-23 18:01:29

在表列中查找每组最频繁的值

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-10-23 14:53:17

解决方案2 0 2021-10-23 18:01:29

解决方案1
0 已采纳 2021-10-23 14:53:17

解决方案2
0 2021-10-23 18:01:29