简体   繁体   English

DB2 SQL-GROUP BY的中位数

[英]DB2 SQL - median with GROUP BY

First of all, I am running on DB2 for i5/OS V5R4. 首先,我正在DB2 i5 / OS V5R4上运行。 I have ROW_NUMBER(), RANK() and common table expressions. 我有ROW_NUMBER(),RANK()和公用表表达式。 I do not have TOP n PERCENT or LIMIT OFFSET. 没有 TOP n PERCENT或LIMIT OFFSET。

The actual data set I'm working with is hard to explain, so let's just say I have a weather history table where the columns are (city, temperature, timestamp) . 我正在使用的实际数据集很难解释,因此,仅说我有一个天气历史记录表,其中的列为(city, temperature, timestamp) I want to compare medians to averages for each group (city) . 我想将每个组(city)中位数与平均值进行比较。

This was the cleanest way I found to get a median for a whole table aggregation. 这是我发现获得整个表聚合的中位数的最干净的方法。 I adapted it from the IBM Redbook here : 我从这里的IBM红皮书改编了它:

WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
  WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t

That works well for getting a single row back, but it seems to fall apart for grouping. 这对于返回单行效果很好,但似乎无法分组。 Conceptually, this is what I want: 从概念上讲,这就是我想要的:


SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...

city           | mean_temp       | median_temp       
===================================================
'Minneapolis'  | 60              | 64
'Milwaukee'    | 65              | 66
'Muskegon'     | 70              | 61

There could be an answer that makes me look stupid, but I'm having a mental block and this isn't my #1 thing to work on right now. 可能有一个答案让我看起来很愚蠢,但是我有一个思维障碍,这不是我现在要做的第一件事。 Seems like it could be possible, but I can't use something that's extremely complex since it's a large table and I want the ability to customize which columns are being aggregated. 似乎有可能,但是我不能使用极其复杂的东西,因为它是一个大表,并且我希望能够自定义要聚合的列。

In SQL Server, agreagate functions like count(*) can be partitioned and calculated without a group by. 在SQL Server中,可以对分区函数(如count(*))进行分区和计算,而无需进行分组依据。 I looked quickly through the referenced redbook, and it looks like DB2 has the same feature. 我快速浏览了所引用的红皮书,看起来DB2具有相同的功能。 But if not, then this won't work: 但是,如果没有,那将行不通:

create table TemperatureHistory 
    (City varchar(20)
    , Temperature decimal(5, 2)
    , DateTaken datetime)

insert into TemperatureHistory values ('Minneapolis', 61, '20090101')
insert into TemperatureHistory values ('Minneapolis', 59, '20090102')

insert into TemperatureHistory values ('Milwaukee', 65, '20090101')
insert into TemperatureHistory values ('Milwaukee', 65, '20090102')
insert into TemperatureHistory values ('Milwaukee', 100, '20090103')

insert into TemperatureHistory values ('Muskegon', 80, '20090101')
insert into TemperatureHistory values ('Muskegon', 70, '20090102')
insert into TemperatureHistory values ('Muskegon', 70, '20090103')
insert into TemperatureHistory values ('Muskegon', 20, '20090104')

; with base_t as
    (select city
        , Temperature
        , row_number() over (partition by city order by temperature) as RowNum
        , (count(*) over (partition by city)) + 1 as CountPlusOne 
    from TemperatureHistory)
select City
    , avg(Temperature) as MeanTemp
    , avg(case 
        when RowNum in (FLOOR(CountPlusOne/2.0), CEILING(CountPlusOne/2.0)) 
            then Temperature
            else null end) as MedianTemp
from base_t 
group by City

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM