[英]Calculate average number of distinct ID's in a period(1 month,3 months ,6 months, 9 months and 12 months)
假設我有以下源表(為了節省空間,我只填寫了 2017 年的數據,但您可以想象,該表有 2017-2021 年的數據)
EMPL_ID TIMESTAMP PART_COL
1 2017-01-01 00:00:00 M
2 2017-01-01 00:00:00 M
3 2017-01-01 00:00:00 M
3 2017-01-01 00:00:00 M
1 2017-02-01 00:00:00 M
2 2017-02-01 00:00:00 M
3 2017-02-01 00:00:00 M
3 2017-02-01 00:00:00 M
1 2017-03-01 00:00:00 M
2 2017-03-01 00:00:00 M
3 2017-03-01 00:00:00 M
1 2017-04-01 00:00:00 M
2 2017-04-01 00:00:00 M
3 2017-04-01 00:00:00 M
1 2017-05-01 00:00:00 M
2 2017-05-01 00:00:00 M
3 2017-05-01 00:00:00 M
4 2017-05-01 00:00:00 M
5 2017-05-01 00:00:00 M
1 2017-06-01 00:00:00 M
2 2017-06-01 00:00:00 M
3 2017-06-01 00:00:00 M
4 2017-06-01 00:00:00 M
4 2017-06-01 00:00:00 M
1 2017-07-01 00:00:00 M
2 2017-07-01 00:00:00 M
3 2017-07-01 00:00:00 M
1 2017-08-01 00:00:00 M
2 2017-08-01 00:00:00 M
3 2017-08-01 00:00:00 M
1 2017-09-01 00:00:00 M
2 2017-09-01 00:00:00 M
1 2017-10-01 00:00:00 M
2 2017-10-01 00:00:00 M
2 2017-10-01 00:00:00 M
3 2017-10-01 00:00:00 M
4 2017-10-01 00:00:00 M
1 2017-11-01 00:00:00 M
2 2017-11-01 00:00:00 M
2 2017-11-01 00:00:00 M
3 2017-11-01 00:00:00 M
1 2017-12-01 00:00:00 M
2 2017-12-01 00:00:00 M
2 2017-12-01 00:00:00 M
3 2017-12-01 00:00:00 M
我想計算以下內容:
最后它應該是這樣的:
UNIQUE_EMPL_ID TIMESTAMP_FROM TIMESTAMP_UNTIL PART_COL
3,00 2017-01-01 00:00:00 2017-01-01 00:00:00 M
3,00 2017-01-01 00:00:00 2017-03-01 00:00:00 M
3,00 2017-01-01 00:00:00 2017-06-01 00:00:00 M
3,50 2017-01-01 00:00:00 2017-09-01 00:00:00 M
3,20 2017-01-01 00:00:00 2017-12-01 00:00:00 M
3,08 2017-02-01 00:00:00 2017-02-01 00:00:00 M
3,00 2017-03-01 00:00:00 2017-03-01 00:00:00 M
3,00 2017-04-01 00:00:00 2017-04-01 00:00:00 M
5,00 2017-05-01 00:00:00 2017-05-01 00:00:00 M
直到 12 個月。
我提出了以下查詢:
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,TIMESTAMP_COL as TIMESTAMP_UNTIL
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
union all
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,2) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
union all
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,5) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 5 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
union all
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,8) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 8 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
union all
SELECT
count(distinct empl_id ) as UNIQUE_EMPL_ID
,TIMESTAMP_COL as TIMESTAMP_FROM
,add_months(TIMESTAMP_COL,11) as TIMESTAMP_UNTIL
,avg(count( distinct empl_id)) OVER (PARTITION BY PART_COLUMN ORDER BY TIMESTAMP_COL ROWS BETWEEN CURRENT ROW AND 11 FOLLOWING) as UNIQUE_EMPL_ID
from source_table
WHERE PART_COLUMN = 'M'
group by TIMESTAMP_COL,PART_COLUMN
問題是:這個結果可以通過更高效的查詢來實現嗎?
首先使用通用表表達式創建一個表,然后您可以使用那里的數據使用UNPIVOT
函數來計算每個時期的總數,然后從 CTE 中進行簡單的選擇以獲得您的每月數據。
任何限制需要返回的數據的過濾器(WHERE 子句)都應該添加到下面的unique_emps
CTE 中。
WITH
sample_data (emp_id, timestamp, part_col)
AS
(SELECT 1, DATE '2017-01-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-01-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-01-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-01-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-02-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-02-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-02-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-02-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-03-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-03-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-03-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-04-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-04-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-04-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 4, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 5, DATE '2017-05-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 4, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 4, DATE '2017-06-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-07-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-07-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-07-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-08-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-08-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-08-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-09-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-09-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 4, DATE '2017-10-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-11-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-11-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-11-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-11-01', 'M' FROM DUAL
UNION ALL
SELECT 1, DATE '2017-12-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-12-01', 'M' FROM DUAL
UNION ALL
SELECT 2, DATE '2017-12-01', 'M' FROM DUAL
UNION ALL
SELECT 3, DATE '2017-12-01', 'M' FROM DUAL),
--Query starts here
--Filters should be added to the unique_emps Common Table Expression to limit data returned
unique_emps (mon, part_col, distinct_emps)
AS
( SELECT timestamp, part_col, COUNT (DISTINCT emp_id)
FROM sample_data
GROUP BY timestamp, part_col)
SELECT timestamp_from,
ADD_MONTHS (timestamp_from, period) AS timestamp_until,
AVG (period_people) AS avg_number_of_people,
part_col
FROM (SELECT TRUNC (mon, 'Y')
AS timestamp_from,
part_col,
CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 1 THEN distinct_emps END
period_people1,
CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 2 THEN distinct_emps END
period_people2,
CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 3 THEN distinct_emps END
period_people3,
CASE WHEN TO_NUMBER (TO_CHAR (mon, 'Q')) <= 4 THEN distinct_emps END
period_people4
FROM unique_emps)
UNPIVOT (period_people
FOR period
IN (period_people1 AS 2,
period_people2 AS 5,
period_people3 AS 8,
period_people4 AS 11))
GROUP BY timestamp_from, period, part_col
UNION ALL
SELECT mon, mon, distinct_emps, part_col FROM unique_emps
ORDER BY timestamp_from, timestamp_until, avg_number_of_people;
TIMESTAMP_FROM TIMESTAMP_UNTIL AVG_NUMBER_OF_PEOPLE PART_COL
_________________ __________________ ___________________________________________ ___________
01-JAN-17 01-JAN-17 3 M
01-JAN-17 01-MAR-17 3 M
01-JAN-17 01-JUN-17 3.5 M
01-JAN-17 01-SEP-17 3.22222222222222222222222222222222222222 M
01-JAN-17 01-DEC-17 3.25 M
01-FEB-17 01-FEB-17 3 M
01-MAR-17 01-MAR-17 3 M
01-APR-17 01-APR-17 3 M
01-MAY-17 01-MAY-17 5 M
01-JUN-17 01-JUN-17 4 M
01-JUL-17 01-JUL-17 3 M
01-AUG-17 01-AUG-17 3 M
01-SEP-17 01-SEP-17 2 M
01-OCT-17 01-OCT-17 4 M
01-NOV-17 01-NOV-17 3 M
01-DEC-17 01-DEC-17 3 M
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.