[英]Find the most frequent value per group in a table column
I need to find most frequent value of object_of_search
for each ethnicity.我需要为每个种族找到
object_of_search
最常见值。 How can I achieve this?我怎样才能做到这一点? Subqueries in the
SELECT
clause and correlated subqueries are not allowed. SELECT
子句中的子查询和相关子查询是不允许的。 Something similar to this:类似的东西:
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
But this does not aggregate and gives me many rows for each ethnicity and object_of_search:但这并没有聚合,并且为每个种族和 object_of_search 提供了许多行:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
ethnicity3 | 2 | 100 | Firearms
ethnicity1 | 5 | 60 | Cat
ethnicity1 | 5 | 60 | Dog
ethnicity2 | 3 | 66.6666666666667 | Firearms
ethnicity1 | 5 | 60 | Psychoactive substances
ethnicity1 | 5 | 60 | Fireworks
And should be something like this:应该是这样的:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
Table on fiddle . 小提琴表。
Query:询问:
SELECT DISTINCT
stopAndSearches.officer_defined_ethnicity,
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;
Table:桌子:
CREATE TABLE IF NOT EXISTS stopAndSearches(
"sas_id" bigserial PRIMARY KEY,
"officer_defined_ethnicity" VARCHAR(255),
"object_of_search" VARCHAR(255),
"outcome" VARCHAR(255)
);
This should address the specific "which object per ethnicity" question.这应该解决具体的“每个种族哪个对象”的问题。
Note, this doesn't address ties in the count.请注意,这并没有解决计数中的关系。 That wasn't part of the question / request.
那不是问题/请求的一部分。
Adjust your SQL to include this logic, to provide that detail:调整您的 SQL 以包含此逻辑,以提供该详细信息:
WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;
Result:结果:
officer_defined_ethnicity![]() |
object_of_search![]() |
n ![]() |
rn![]() |
---|---|---|---|
ethnicity1![]() |
Cat![]() |
1 ![]() |
1 ![]() |
ethnicity2![]() |
Stolen goods![]() |
2 ![]() |
1 ![]() |
ethnicity3![]() |
Fireworks![]() |
1 ![]() |
1 ![]() |
SELECT DISTINCT ON (1)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY 1, 2
ORDER BY 1, 3 DESC, 2;
Or more explicitly:或更明确地说:
SELECT DISTINCT ON (officer_defined_ethnicity)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY officer_defined_ethnicity, object_of_search
ORDER BY officer_defined_ethnicity, ct DESC, object_of_search;
officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
ethnicity1 | Cat | 1
ethnicity2 | Stolen goods | 2
ethnicity3 | Firearms | 1
Since DISTINCT ON
is applied after GROUP BY
we only need a single query level.由于
DISTINCT ON
在GROUP BY
之后应用,我们只需要一个查询级别。
(officer_defined_ethnicity, object_of_search)
with GROUP BY
.GROUP BY
聚合以获取每个(officer_defined_ethnicity, object_of_search)
计数。officer_defined_ethnicity
with DISTINCT ON
.DISTINCT ON
每个officer_defined_ethnicity
计数最高的行。 I added object_of_search
as third ORDER BY
item to act as tiebreaker and produce a deterministic result:我添加了
object_of_search
作为第三个ORDER BY
项目以充当决胜局并产生确定性结果:
In case of ties, pick the earliest object_of_search
according to alphabetical sort order.在
object_of_search
情况下,根据字母排序顺序选择最早的object_of_search
。
Adapt to your needs.适应您的需求。
See:看:
Simpler and typically faster than a subquery with row_number()
:比使用
row_number()
的子查询更简单且通常更快:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.