[英]Calculate average number of distinct ID's in a period(1 month,3 months ,6 months, 9 months and 12 months)
Suppose I have the following source table(I filled in data only for year 2017 in order to save some space, but you can imagine, that table has data from 2017-2021)假设我有以下源表(为了节省空间,我只填写了 2017 年的数据,但您可以想象,该表有 2017-2021 年的数据)
EMPL_ID TIMESTAMP PART_COL
1 2017-01-01 00:00:00 M
2 2017-01-01 00:00:00 M
3 2017-01-01 00:00:00 M
3 2017-01-01 00:00:00 M
1 2017-02-01 00:00:00 M
2 2017-02-01 00:00:00 M
3 2017-02-01 00:00:00 M
3 2017-02-01 00:00:00 M
1 2017-03-01 00:00:00 M
2 2017-03-01 00:00:00 M
3 2017-03-01 00:00:00 M
1 2017-04-01 00:00:00 M
2 2017-04-01 00:00:00 M
3 2017-04-01 00:00:00 M
1 2017-05-01 00:00:00 M
2 2017-05-01 00:00:00 M
3 2017-05-01 00:00:00 M
4 2017-05-01 00:00:00 M
5 2017-05-01 00:00:00 M
1 2017-06-01 00:00:00 M
2 2017-06-01 00:00:00 M
3 2017-06-01 00:00:00 M
4 2017-06-01 00:00:00 M
4 2017-06-01 00:00:00 M
1 2017-07-01 00:00:00 M
2 2017-07-01 00:00:00 M
3 2017-07-01 00:00:00 M
1 2017-08-01 00:00:00 M
2 2017-08-01 00:00:00 M
3 2017-08-01 00:00:00 M
1 2017-09-01 00:00:00 M
2 2017-09-01 00:00:00 M
1 2017-10-01 00:00:00 M
2 2017-10-01 00:00:00 M
2 2017-10-01 00:00:00 M
3 2017-10-01 00:00:00 M
4 2017-10-01 00:00:00 M
1 2017-11-01 00:00:00 M
2 2017-11-01 00:00:00 M
2 2017-11-01 00:00:00 M
3 2017-11-01 00:00:00 M
1 2017-12-01 00:00:00 M
2 2017-12-01 00:00:00 M
2 2017-12-01 00:00:00 M
3 2017-12-01 00:00:00 M
I want to calculate the following things:我想计算以下内容:
At the end it should look like this:最后它应该是这样的:
UNIQUE_EMPL_ID TIMESTAMP_FROM TIMESTAMP_UNTIL PART_COL
3,00 2017-01-01 00:00:00 2017-01-01 00:00:00 M
3,00 2017-01-01 00:00:00 2017-03-01 00:00:00 M
3,00 2017-01-01 00:00:00 2017-06-01 00:00:00 M
3,50 2017-01-01 00:00:00 2017-09-01 00:00:00 M
3,20 2017-01-01 00:00:00 2017-12-01 00:00:00 M
3,08 2017-02-01 00:00:00 2017-02-01 00:00:00 M
3,00 2017-03-01 00:00:00 2017-03-01 00:00:00 M
3,00 2017-04-01 00:00:00 2017-04-01 00:00:00 M
5,00 2017-05-01 00:00:00 2017-05-01 00:00:00 M
and so until 12 month.直到 12 个月。
I have come up with the following query:我提出了以下查询:
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,TIMESTAMP_COL as TIMESTAMP_UNTIL
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
union all
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,2) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
union all
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,5) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 5 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
union all
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,8) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 8 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
union all
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,11) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 11 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
Question is: can this result be achieved by more efficient query?问题是:这个结果可以通过更高效的查询来实现吗?
By first creating a table using a Common Table Expression, you can then use the data there to use the UNPIVOT
function to calculate the totals for each period, then a simple select from the CTE to get your monthly data.首先使用通用表表达式创建一个表,然后您可以使用那里的数据使用
UNPIVOT
函数来计算每个时期的总数,然后从 CTE 中进行简单的选择以获得您的每月数据。
Any filters (WHERE clauses) to limit the data that needs to be returned should be added to the unique_emps
CTE below.任何限制需要返回的数据的过滤器(WHERE 子句)都应该添加到下面的
unique_emps
CTE 中。
WITH
sample_data (emp_id, timestamp, part_col)
AS
(SELECT 1, DATE '2017-01-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-01-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-01-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-01-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-02-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-02-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-02-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-02-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-03-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-03-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-03-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-04-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-04-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-04-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 4, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 5, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 4, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 4, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-07-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-07-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-07-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-08-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-08-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-08-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-09-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-09-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 4, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-11-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-11-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-11-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-11-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-12-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-12-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-12-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-12-01', 'M' FROM DUAL),
--Query starts here
--Filters should be added to the unique_emps Common Table Expression to limit data returned
unique_emps (mon, part_col, distinct_emps)
AS
( SELECT timestamp, part_col, COUNT (DISTINCT emp_id)
FROM sample_data
GROUP BY timestamp, part_col)
SELECT timestamp_from,
ADD_MONTHS (timestamp_from, period) AS timestamp_until,
AVG (period_people) AS avg_number_of_people,
part_col
FROM (SELECT TRUNC (mon, 'Y')
AS timestamp_from,
part_col,
CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 1 THEN distinct_emps END
period_people1,
CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 2 THEN distinct_emps END
period_people2,
CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 3 THEN distinct_emps END
period_people3,
CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 4 THEN distinct_emps END
period_people4
FROM unique_emps)
UNPIVOT (period_people
FOR period
IN (period_people1 AS 2,
period_people2 AS 5,
period_people3 AS 8,
period_people4 AS 11))
GROUP BY timestamp_from, period, part_col
UNION ALL
SELECT mon, mon, distinct_emps, part_col FROM unique_emps
ORDER BY timestamp_from, timestamp_until, avg_number_of_people;
TIMESTAMP_FROM TIMESTAMP_UNTIL AVG_NUMBER_OF_PEOPLE PART_COL
_________________ __________________ ___________________________________________ ___________
01-JAN-17 01-JAN-17 3 M
01-JAN-17 01-MAR-17 3 M
01-JAN-17 01-JUN-17 3.5 M
01-JAN-17 01-SEP-17 3.22222222222222222222222222222222222222 M
01-JAN-17 01-DEC-17 3.25 M
01-FEB-17 01-FEB-17 3 M
01-MAR-17 01-MAR-17 3 M
01-APR-17 01-APR-17 3 M
01-MAY-17 01-MAY-17 5 M
01-JUN-17 01-JUN-17 4 M
01-JUL-17 01-JUL-17 3 M
01-AUG-17 01-AUG-17 3 M
01-SEP-17 01-SEP-17 2 M
01-OCT-17 01-OCT-17 4 M
01-NOV-17 01-NOV-17 3 M
01-DEC-17 01-DEC-17 3 M
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.