简体   繁体   English

计算一段时间内不同 ID 的平均数量(1 个月、3 个月、6 个月、9 个月和 12 个月)

[英]Calculate average number of distinct ID's in a period(1 month,3 months ,6 months, 9 months and 12 months)

Suppose I have the following source table(I filled in data only for year 2017 in order to save some space, but you can imagine, that table has data from 2017-2021)假设我有以下源表(为了节省空间,我只填写了 2017 年的数据,但您可以想象,该表有 2017-2021 年的数据)

EMPL_ID     TIMESTAMP        PART_COL
1       2017-01-01 00:00:00     M
2       2017-01-01 00:00:00     M
3       2017-01-01 00:00:00     M
3       2017-01-01 00:00:00     M
1       2017-02-01 00:00:00     M
2       2017-02-01 00:00:00     M
3       2017-02-01 00:00:00     M
3       2017-02-01 00:00:00     M
1       2017-03-01 00:00:00     M
2       2017-03-01 00:00:00     M
3       2017-03-01 00:00:00     M
1       2017-04-01 00:00:00     M
2       2017-04-01 00:00:00     M
3       2017-04-01 00:00:00     M
1       2017-05-01 00:00:00     M
2       2017-05-01 00:00:00     M
3       2017-05-01 00:00:00     M
4       2017-05-01 00:00:00     M
5       2017-05-01 00:00:00     M
1       2017-06-01 00:00:00     M
2       2017-06-01 00:00:00     M
3       2017-06-01 00:00:00     M
4       2017-06-01 00:00:00     M
4       2017-06-01 00:00:00     M
1       2017-07-01 00:00:00     M
2       2017-07-01 00:00:00     M
3       2017-07-01 00:00:00     M
1       2017-08-01 00:00:00     M
2       2017-08-01 00:00:00     M
3       2017-08-01 00:00:00     M
1       2017-09-01 00:00:00     M
2       2017-09-01 00:00:00     M 
1       2017-10-01 00:00:00     M
2       2017-10-01 00:00:00     M
2       2017-10-01 00:00:00     M
3       2017-10-01 00:00:00     M
4       2017-10-01 00:00:00     M
1       2017-11-01 00:00:00     M
2       2017-11-01 00:00:00     M
2       2017-11-01 00:00:00     M
3       2017-11-01 00:00:00     M
1       2017-12-01 00:00:00     M
2       2017-12-01 00:00:00     M
2       2017-12-01 00:00:00     M
3       2017-12-01 00:00:00     M

I want to calculate the following things:我想计算以下内容:

  1. For january I want to calculate 4 things:一月份我想计算四件事:
  • number of unuqie empl_id for january partitioned by PART_COL PART_COL 分区的 1 月的 unuqie empl_id 数
  • average number of unique empl_id over 3 months(from january until march) partitioned by PART_COL PART_COL 划分的 3 个月内(​​从 1 月到 3 月)唯一 empl_id 的平均数量
  • average number of unique empl_id over 6 months(from january until june) partitioned by PART_COL PART_COL 划分的 6 个月内(​​从 1 月到 6 月)唯一 empl_id 的平均数量
  • average number of unique empl_id over 9 months(from january until september) partitioned by PART_COL由 PART_COL 划分的 9 个月内(​​从 1 月到 9 月)的唯一 empl_id 的平均数
  • average number of unique empl_id over 12 months(from january until december) partitioned by PART_COL PART_COL 划分的 12 个月内(​​从 1 月到 12 月)唯一 empl_id 的平均数量
  1. For the rest of the months i want to calculate在剩下的几个月里,我想计算
  • number of unuqie empl_id for a month partitioned by PART_COL按 PART_COL 划分的一个月的 unuqie empl_id 数

At the end it should look like this:最后它应该是这样的:

UNIQUE_EMPL_ID  TIMESTAMP_FROM  TIMESTAMP_UNTIL   PART_COL
3,00    2017-01-01 00:00:00 2017-01-01 00:00:00      M
3,00    2017-01-01 00:00:00 2017-03-01 00:00:00      M
3,00    2017-01-01 00:00:00 2017-06-01 00:00:00      M
3,50    2017-01-01 00:00:00 2017-09-01 00:00:00      M
3,20    2017-01-01 00:00:00 2017-12-01 00:00:00      M
3,08    2017-02-01 00:00:00 2017-02-01 00:00:00      M
3,00    2017-03-01 00:00:00 2017-03-01 00:00:00      M
3,00    2017-04-01 00:00:00 2017-04-01 00:00:00      M
5,00    2017-05-01 00:00:00 2017-05-01 00:00:00      M

and so until 12 month.直到 12 个月。

I have come up with the following query:我提出了以下查询:

SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,TIMESTAMP_COL as TIMESTAMP_UNTIL
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN

union all

SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,2) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN


union all

SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,5) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 5 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN

union all

SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,8) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 8 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN

union all

SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,11) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 11 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN

Question is: can this result be achieved by more efficient query?问题是:这个结果可以通过更高效的查询来实现吗?

By first creating a table using a Common Table Expression, you can then use the data there to use the UNPIVOT function to calculate the totals for each period, then a simple select from the CTE to get your monthly data.首先使用通用表表达式创建一个表,然后您可以使用那里的数据使用UNPIVOT函数来计算每个时期的总数,然后从 CTE 中进行简单的选择以获得您的每月数据。

Any filters (WHERE clauses) to limit the data that needs to be returned should be added to the unique_emps CTE below.任何限制需要返回的数据的过滤器(WHERE 子句)都应该添加到下面的unique_emps CTE 中。

WITH
    sample_data (emp_id, timestamp, part_col)
    AS
        (SELECT 1, DATE '2017-01-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-01-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-01-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-01-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-02-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-02-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-02-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-02-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-03-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-03-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-03-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-04-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-04-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-04-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-05-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-05-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-05-01', 'M' FROM DUAL
         UNION ALL
         SELECT 4, DATE '2017-05-01', 'M' FROM DUAL
         UNION ALL
         SELECT 5, DATE '2017-05-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-06-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-06-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-06-01', 'M' FROM DUAL
         UNION ALL
         SELECT 4, DATE '2017-06-01', 'M' FROM DUAL
         UNION ALL
         SELECT 4, DATE '2017-06-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-07-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-07-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-07-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-08-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-08-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-08-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-09-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-09-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-10-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-10-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-10-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-10-01', 'M' FROM DUAL
         UNION ALL
         SELECT 4, DATE '2017-10-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-11-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-11-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-11-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-11-01', 'M' FROM DUAL
         UNION ALL
         SELECT 1, DATE '2017-12-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-12-01', 'M' FROM DUAL
         UNION ALL
         SELECT 2, DATE '2017-12-01', 'M' FROM DUAL
         UNION ALL
         SELECT 3, DATE '2017-12-01', 'M' FROM DUAL),
    --Query starts here
    --Filters should be added to the unique_emps Common Table Expression to limit data returned
    unique_emps (mon, part_col, distinct_emps)
    AS
        (  SELECT timestamp, part_col, COUNT (DISTINCT emp_id)
             FROM sample_data
         GROUP BY timestamp, part_col)
  SELECT timestamp_from,
         ADD_MONTHS (timestamp_from, period)     AS timestamp_until,
         AVG (period_people)                     AS avg_number_of_people,
         part_col
    FROM (SELECT TRUNC (mon, 'Y')
                     AS timestamp_from,
                 part_col,
                 CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 1 THEN distinct_emps END
                     period_people1,
                 CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 2 THEN distinct_emps END
                     period_people2,
                 CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 3 THEN distinct_emps END
                     period_people3,
                 CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 4 THEN distinct_emps END
                     period_people4
            FROM unique_emps)
         UNPIVOT (period_people
                 FOR period
                 IN (period_people1 AS 2,
                    period_people2 AS 5,
                    period_people3 AS 8,
                    period_people4 AS 11))
GROUP BY timestamp_from, period, part_col
UNION ALL
SELECT mon, mon, distinct_emps, part_col FROM unique_emps
ORDER BY timestamp_from, timestamp_until, avg_number_of_people;

Result结果


   TIMESTAMP_FROM    TIMESTAMP_UNTIL                        AVG_NUMBER_OF_PEOPLE    PART_COL
_________________ __________________ ___________________________________________ ___________
01-JAN-17         01-JAN-17                                                    3 M
01-JAN-17         01-MAR-17                                                    3 M
01-JAN-17         01-JUN-17                                                  3.5 M
01-JAN-17         01-SEP-17             3.22222222222222222222222222222222222222 M
01-JAN-17         01-DEC-17                                                 3.25 M
01-FEB-17         01-FEB-17                                                    3 M
01-MAR-17         01-MAR-17                                                    3 M
01-APR-17         01-APR-17                                                    3 M
01-MAY-17         01-MAY-17                                                    5 M
01-JUN-17         01-JUN-17                                                    4 M
01-JUL-17         01-JUL-17                                                    3 M
01-AUG-17         01-AUG-17                                                    3 M
01-SEP-17         01-SEP-17                                                    2 M
01-OCT-17         01-OCT-17                                                    4 M
01-NOV-17         01-NOV-17                                                    3 M
01-DEC-17         01-DEC-17                                                    3 M

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM