[英]Cumulative summation over null values
I have tried to calculate cumulative sum column to find out Present Working Employees in each month, but am getting NULL instead of present employee as per previous month. 我试图计算累计总和列以找出每个月的在职员工,但是按上个月却得到NULL而不是在职员工。
Table employees: 表员工:
id date_started date_terminated
1 01-Apr-14 NULL
2 21-Apr-14 NULL
3 11-Apr-14 NULL
4 01-Apr-14 NULL
5 01-Apr-14 NULL
6 05-Apr-14 NULL
7 01-Apr-14 NULL
8 01-Apr-14 NULL
9 01-Apr-14 NULL
10 29-Apr-14 NULL
11 21-Apr-14 NULL
12 01-Apr-14 NULL
13 01-Apr-14 NULL
14 01-Apr-14 NULL
15 05-Aug-14 NULL
16 01-Oct-1 NULL
17 13-Oct-14 NULL
18 22-Oct-14 NULL
19 25-Oct-14 NULL
10 29-Oct-14 NULL
Table dates: It containing date
column which having data from 2011-Jan-01
to current date. 表格日期:包含
date
列,其中包含从2011-Jan-01
到当前日期的数据。
Obtained result Table from my query : 从我的查询获得结果表:
+--------------------------------------------------------------+
| date | employee_joined | present_employees |
+--------------------------------------------------------------+
| 2014-01-01 00:00:00-7 | NULL | NULL |
| 2014-02-01 00:00:00-7 | NULL | NULL |
| 2014-03-01 00:00:00-7 | NULL | NULL |
| 2014-04-01 00:00:00-7 | 14 | 14 |
| 2014-05-01 00:00:00-7 | NULL | NULL |
| 2014-06-01 00:00:00-7 | NULL | NULL |
| 2014-07-01 00:00:00-7 | NULL | NULL |
| 2014-08-01 00:00:00-7 | 1 | 15 |
| 2014-09-01 00:00:00-7 | NULL | NULL |
| 2014-10-01 00:00:00-7 | 5 | 20 |
+--------------------------------------------------------------+
I am looking for resultant table: 我正在寻找结果表:
+--------------------------------------------------------------+
| date | employee_joined | present_employees |
+--------------------------------------------------------------+
| 2014-01-01 00:00:00-7 | NULL | NULL |
| 2014-02-01 00:00:00-7 | NULL | NULL |
| 2014-03-01 00:00:00-7 | NULL | NULL |
| 2014-04-01 00:00:00-7 | 14 | 14 |
| 2014-05-01 00:00:00-7 | NULL | 14 |
| 2014-06-01 00:00:00-7 | NULL | 14 |
| 2014-07-01 00:00:00-7 | NULL | 14 |
| 2014-08-01 00:00:00-7 | 1 | 15 |
| 2014-09-01 00:00:00-7 | NULL | 15 |
| 2014-10-01 00:00:00-7 | 5 | 20 |
+--------------------------------------------------------------+
I have tried to get data from below query: 我试图从下面的查询中获取数据:
/*-----ONLY FOR PRESENT EMPLOYEES USING CUMULATIVE SUM--------*/
WITH fdates AS
(
SELECT DATE_TRUNC('month', d.date) AS date
FROM dates d
WHERE d.date::DATE <= '10-01-2014' AND
d.date::DATE >= '01-01-2014'
group by DATE_TRUNC('month', d.date)
),
employeeJoin AS
(
SELECT COALESCE( COUNT(e.id), 0 ) AS employee_joined,
DATE_TRUNC( 'month', e.date_started) AS date_started
FROM employees e GROUP BY DATE_TRUNC( 'month', e.date_started)
),
employeeJoinRownum AS
(
SELECT employee_joined, date_started, row_number() OVER (order by date_started) rownum
FROM employeeJoin
)
SELECT d.*, employee_joined AS employee_joined,
(SELECT sum(employee_joined) FROM employeeJoinRownum eJ2 WHERE eJ2.rownum <= eJ1.rownum) AS Total_Joined_Employees
FROM fdates d
LEFT OUTER JOIN employeeJoinRownum eJ1 ON( eJ1.date_started = DATE_TRUNC('month', d.date) )
ORDER BY d.date
The following query counts the employees joined and employees left for each date and then uses a window function to accumulate the results. 以下查询计算每个日期的入职员工和离职员工,然后使用窗口函数累计结果。
SELECT
dates.date,
COUNT(DISTINCT ej.id) AS employee_joined,
COUNT(DISTINCT el.id) AS employee_left,
SUM(COUNT(DISTINCT ej.id) - COUNT(DISTINCT el.id)) OVER (ORDER BY dates.date) AS present_employees
FROM
dates LEFT JOIN employees ej
ON
ej.date_started = dates.date LEFT JOIN employees el
ON
el.date_terminated = dates.date
GROUP BY
dates.date;
In case you do not have a prefilled dates
table, you can use the generate_series set returning function instead and left join to it. 如果没有预填的
dates
表,则可以改用generate_series集合返回函数,然后左键联接。
SELECT
...
FROM
GENERATE_SERIES('2014-01-01', '2014-01-10', '1 day'::interval) dates LEFT JOIN employees ej
ON
...
You could normalize the table by creating a row for both a join and a terminate event: 您可以通过为联接和终止事件创建一行来规范化表:
select welcome as date
, 1 as size_change
from emps
union all
select bye
, -1
from emps
where bye is not null
Now you can use a running sum to calculate the current size: 现在,您可以使用运行总和来计算当前大小:
; with events as
(
select welcome as date
, 1 as size_change
from emps
union all
select bye
, -1
from emps
where bye is not null
)
select distinct to_char(date, 'YYYY-MM-DD') as date
, sum(size_change) over (order by date) as family_size
from events
order by
date
;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.