简体   繁体   English

SQL-在窗口函数内过滤

[英]SQL - Filtering within a window function

I'm using SQL Server 2014 我正在使用SQL Server 2014

I am trying to remove some noise in a dataset by taking the average of all values in the 90th percentile of a group. 我正在尝试通过获取组的第90个百分位数中所有值的平均值来消除数据集中的一些噪音。 Here is the query: 这是查询:

SELECT
    DISTINCT EventLocation,
    PERCENTILE_CONT(.90) 
        WITHIN GROUP (ORDER BY (DATEDIFF(MINUTE, StartTime, EndTime)) ASC) 
        OVER (PARTITION BY EventLocation) 
        AS 'P90',
    AVG(DATEDIFF(MINUTE, StartTime, EndTime))
        OVER (PARTITION BY EventLocation) 
        AS 'Mean'
  FROM MyTable
  ORDER BY N DESC

Currently there are 2 calculated columns: 当前有2个计算列:

  • The 90th percentile value (of PARTITION population) PARTITION人口的)第90个百分位数
  • The mean (of PARTITION population) PARTITION人口的)平均值

I Want to add another column for: 我想为以下内容添加另一列:

  • The mean of values (in a PARTITION population) <= the 90th percentile value (of that PARTITION population) 值的平均值(在PARTITION总体中)<=第90个百分位数(在该PARTITION总体中)

Something like: 就像是:

AVG(DATEDIFF(MINUTE, StartTime, EndTime))
    OVER (PARTITION BY EventLocation) 
    HAVING (DATEDIFF(MINUTE, StartTime, EndTime) <= [ 90th percentile value ])
    AS 'Mean90'

I'm not exactly sure how to approach this since it is referencing the 90th percentile value that was just deifned in P90 ...maybe a user-defined function applied group-wise, creating multiple tables and joining them, or something else. 我不确定要如何处理此问题,因为它引用的是P90中刚刚定义的第90个百分位数值……可能是用户定义的函数按组应用,创建多个表并将它们联接在一起,或者其他原因。

As Gordon said, a CTE is a common way to solve a problem like this. 正如戈登所说,CTE是解决此类问题的常用方法。 Store the results of your original query in the CTE, then select the content of the CTE and add the work you want done using the column aliases you defined. 将原始查询的结果存储在CTE中,然后选择CTE的内容,并使用定义的列别名添加要完成的工作。

;WITH IntermediateResults AS (
    SELECT
        DISTINCT EventLocation,
        PERCENTILE_CONT(.90) 
            WITHIN GROUP (ORDER BY (DATEDIFF(MINUTE, StartTime, EndTime)) ASC) 
            OVER (PARTITION BY EventLocation) 
            AS 'P90',
        AVG(DATEDIFF(MINUTE, StartTime, EndTime))
            OVER (PARTITION BY EventLocation) 
            AS 'Mean'
    FROM MyTable
    ORDER BY N DESC
)

SELECT
    *,
    AVG(DATEDIFF(MINUTE, StartTime, EndTime))
        OVER (PARTITION BY EventLocation) 
        HAVING (DATEDIFF(MINUTE, StartTime, EndTime) <= P90)
        AS 'Mean90'
FROM IntermediateResults

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM