简体   繁体   English

使用AWS Redshift中的Group By计算中位数

[英]Calculating median with Group By in AWS Redshift

I've seen other posts about using the median() window function in Redshift , but how would you use it with a query that has a group by at the end? 我已经看过其他关于在Redshift中使用median()窗口函数的帖子 ,但是你如何将它与最后有一个group by的查询一起使用?

For example, assume table course: 例如,假设表课程:

Course | Subject | Num_Students
-------------------------------
   1   |  Math   |      4
   2   |  Math   |      6
   3   |  Math   |      10
   4   | Science |      2
   5   | Science |      10
   6   | Science |      12

I want to get the median number of students for each course subject. 我想得到每门课程的学生中位数。 How would I write a query that gives the following result: 我如何编写一个给出以下结果的查询:

  Subject  | Median
-----------------------
 Math      |     6
 Science   |     10

I've tried: 我试过了:

SELECT
subject, median(num_students) over ()
FROM
course
GROUP BY 1
;

But it lists every occurrence of the subject and the same median number across subjects like (this is fake data so the actual value it returns is not 6, but just showing it's the same across all subjects): 但是它列出了主题的每一次出现以及相同主题的相同中位数数字(这是假数据,因此它返回的实际值不是6,但只显示所有主题的相同):

  Subject  | Median
-----------------------
 Math      |     6
 Math      |     6
 Math      |     6
 Science   |     6
 Science   |     6
 Science   |     6

The following will get you exactly the result you are looking for: 以下内容将为您提供您正在寻找的结果:

SELECT distinct
subject, median(num_students) over(partition by Subject) 
FROM
course
order by Subject;

您只需要删除它的“over()”部分。

SELECT subject, median(num_students) FROM course GROUP BY 1;

You haven't defined a partition in the window. 您尚未在窗口中定义分区。 Instead of OVER() you need OVER(PARTITION BY subject) . 而不是OVER()你需要OVER(PARTITION BY subject)

Let's say you want to calculate other aggregations, by subject, like avg(), you need to use sub-query: 假设您要计算其他聚合,按主题,如avg(),您需要使用子查询:

WITH subject_numstudents_medianstudents AS (
    SELECT
        subject
        , num_students
        , median(num_students) over (partition BY subject) AS median_students
    FROM
        course
)
SELECT
    subject
    , median_students
    , avg(num_students) as avg_students
FROM subject_numstudents_medianstudents
GROUP BY 1, 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM