[英]consecutive days in sql
I found many stackoverflow QnAs about consecutive days. 我发现连续几天有很多stackoverflow QnAs。
Still answers are too short for me to understand what's going on. 仍然答案太短,我无法理解发生了什么。
For concreteness, I'll make up a model (or a table) 为了具体,我会组成一个模型(或一个表)
(I'm using postgresql if it makes a difference.) (如果它有所作为,我正在使用postgresql。)
CREATE TABLE work (
id integer NOT NULL,
user_id integer NOT NULL,
arrived_at timestamp with time zone NOT NULL
);
insert into work(user_id, arrived_at) values(1, '01/03/2011');
insert into work(user_id, arrived_at) values(1, '01/04/2011');
(In simplest form) For a given user, I want to find the last-consecutive date range. (最简单的形式)对于给定的用户,我想找到最后连续的日期范围。
(My ultimate goal) For a given user, I want to find his consecutive working days. (我的最终目标)对于给定的用户,我想找到他连续的工作日。
If he came to work yesterday, he still(as of today) has chance of working consecutive days. 如果他昨天来上班,他仍然(截至今天)有连续工作的机会。 So I show him consecutive days upto yesterday.
所以我昨天连续几天告诉他。
But if he missed yesterday, his consecutive days is either 0 or 1 depending on whether he came today or not. 但如果他昨天错过了,他的连续几天是0或1,取决于他今天是否来。
Say today is 8th day. 今天说是第8天。
3 * 5 6 7 * = 3 days (5 to 7)
3 * 5 6 7 8 = 4 days (5 to 8)
3 4 5 * 7 * = 1 day (7 to 7)
3 * * * * * = 0 day
3 * * * * 8 = 1 day (8 to 8)
Here is my solution to this problem using CTE
这是我使用
CTE
解决这个问题的方法
WITH RECURSIVE CTE(attendanceDate)
AS
(
SELECT * FROM
(
SELECT attendanceDate FROM attendance WHERE attendanceDate = current_date
OR attendanceDate = current_date - INTERVAL '1 day'
ORDER BY attendanceDate DESC
LIMIT 1
) tab
UNION ALL
SELECT a.attendanceDate FROM attendance a
INNER JOIN CTE c
ON a.attendanceDate = c.attendanceDate - INTERVAL '1 day'
)
SELECT COUNT(*) FROM CTE;
Check the code at SQL Fiddle 检查SQL Fiddle的代码
Here is how the query is working: 以下是查询的工作方式:
attendance
table. attendance
表中选择今天的记录。 If today's record is not available then it selects yesterday's record If you want to select latest consecutive date range irrespective of when was user's latest attendance(today, yesterday or x days before), then the initialization part of CTE must be replaced by below snippet: 如果您想选择最新的连续日期范围而不管用户最近的出勤时间(今天,昨天或前几天),则CTE的初始化部分必须替换为以下片段:
SELECT MAX(attendanceDate) FROM attendance
[EDIT] Here is query at SQL Fiddle which resolves your question#1: SQL Fiddle [编辑]这是SQL Fiddle的查询,它解决了你的问题#1: SQL小提琴
-- some data
CREATE table dayworked (
id SERIAL NOT NULL PRIMARY KEY
, user_id INTEGER NOT NULL
, arrived_at DATE NOT NULL
, UNIQUE (user_id, arrived_at)
);
INSERT INTO dayworked(user_id, arrived_at) VALUES
( 1, '2014-02-03')
,( 1, '2014-02-05')
,( 1, '2014-02-06')
,( 1, '2014-02-07')
--
,( 2, '2014-02-03')
,( 2, '2014-02-05')
,( 2, '2014-02-06')
,( 2, '2014-02-07')
,( 2, '2014-02-08')
--
,( 3, '2014-02-03')
,( 3, '2014-02-04')
,( 3, '2014-02-05')
,( 3, '2014-02-07')
--
,( 5, '2014-02-08')
;
-- The query
WITH RECURSIVE stretch AS (
SELECT dw.user_id AS user_id
, dw.arrived_at AS first_day
, dw.arrived_at AS last_day
, 1::INTEGER AS nday
FROM dayworked dw
WHERE NOT EXISTS ( -- Find start of chain: no previous day
SELECT * FROM dayworked nx
WHERE nx.user_id = dw.user_id
AND nx. arrived_at = dw.arrived_at -1
)
UNION ALL
SELECT dw.user_id AS user_id
, st.first_day AS first_day
, dw.arrived_at AS last_day
, 1+st.nday AS nday
FROM dayworked dw -- connect to chain: previous day := day before this day
JOIN stretch st ON st.user_id = dw.user_id AND st.last_day = dw.arrived_at -1
)
SELECT * FROM stretch st
WHERE (st.nday > 1 OR st.first_day = NOW()::date ) -- either more than one consecutive dat or starting today
AND NOT EXISTS ( -- Only the most recent stretch
SELECT * FROM stretch nx
WHERE nx.user_id = st .user_id
AND nx.first_day > st.first_day
)
AND NOT EXISTS ( -- omit partial chains
SELECT * FROM stretch nx
WHERE nx.user_id = st .user_id
AND nx.first_day = st.first_day
AND nx.last_day > st.last_day
)
;
Result: 结果:
CREATE TABLE
INSERT 0 14
user_id | first_day | last_day | nday
---------+------------+------------+------
1 | 2014-02-05 | 2014-02-07 | 3
2 | 2014-02-05 | 2014-02-08 | 4
(2 rows)
You can create an aggregate with the range types: 您可以使用范围类型创建聚合:
Create function sfunc (tstzrange, timestamptz)
returns tstzrange
language sql strict as $$
select case when $2 - upper($1) <= '1 day'::interval
then tstzrange(lower($1), $2, '[]')
else tstzrange($2, $2, '[]') end
$$;
Create aggregate consecutive (timestamptz) (
sfunc = sfunc,
stype = tstzrange,
initcond = '[,]'
);
Use the aggregate with the right order the get the consecutive day range for the last arrived_at: 使用具有正确顺序的聚合获得最后到达的get_at的连续日期范围:
Select user_id, consecutive(arrived_at order by arrived_at)
from work
group by user_id;
┌─────────┬─────────────────────────────────────────────────────┐
│ user_id │ consecutive │
├─────────┼─────────────────────────────────────────────────────┤
│ 1 │ ["2011-01-03 00:00:00+02","2011-01-05 00:00:00+02"] │
│ 2 │ ["2011-01-06 00:00:00+02","2011-01-06 00:00:00+02"] │
└─────────┴─────────────────────────────────────────────────────┘
Use the aggregate in a window function: 在窗口函数中使用聚合:
Select *,
consecutive(arrived_at)
over (partition by user_id order by arrived_at)
from work;
┌────┬─────────┬────────────────────────┬─────────────────────────────────────────────────────┐
│ id │ user_id │ arrived_at │ consecutive │
├────┼─────────┼────────────────────────┼─────────────────────────────────────────────────────┤
│ 1 │ 1 │ 2011-01-03 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-03 00:00:00+02"] │
│ 2 │ 1 │ 2011-01-04 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-04 00:00:00+02"] │
│ 3 │ 1 │ 2011-01-05 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-05 00:00:00+02"] │
│ 4 │ 2 │ 2011-01-06 00:00:00+02 │ ["2011-01-06 00:00:00+02","2011-01-06 00:00:00+02"] │
└────┴─────────┴────────────────────────┴─────────────────────────────────────────────────────┘
Query the results to find what you need: 查询结果以找到您需要的内容:
With work_detail as (select *,
consecutive(arrived_at)
over (partition by user_id order by arrived_at)
from work)
select arrived_at, upper(consecutive) - lower(consecutive) as days
from work_detail
where user_id = 1 and upper(consecutive) != lower(consecutive)
order by arrived_at desc
limit 1;
┌────────────────────────┬────────┐
│ arrived_at │ days │
├────────────────────────┼────────┤
│ 2011-01-05 00:00:00+02 │ 2 days │
└────────────────────────┴────────┘
You can even do this without a recursive CTE: 你甚至可以在没有递归CTE的情况下做到这一点:
with generate_series()
, LEFT JOIN
, row_count()
and a final LIMIT 1
: 使用
generate_series()
, LEFT JOIN
, row_count()
和最终的LIMIT 1
:
1 for "today" plus consecutive days up until "yesterday": 1为“今天”加上连续几天直到“昨天”:
SELECT count(*) -- 1 / 0 for "today"
+ COALESCE(( -- + optional count of consecutive days up until "yesterday"
SELECT ct
FROM (
SELECT d.ct, count(w.arrived_at) OVER (ORDER BY d.ct) AS day_ct
FROM generate_series(1, 8) AS d(ct) -- maximum = 8
LEFT JOIN work w ON w.arrived_at >= current_date - d.ct
AND w.arrived_at < current_date - (d.ct - 1)
AND w.user_id = 1 -- given user
) sub
WHERE ct = day_ct
ORDER BY ct DESC
LIMIT 1
), 0) AS total
FROM work
WHERE arrived_at >= current_date -- no future timestamps
AND user_id = 1 -- given user
Assuming 0 or 1 entry per day. 假设每天0或1次进入。 Should be fast.
应该快。
For best performance (for this or the CTE solution alike) you would have a multicolumn index like: 为获得最佳性能(对于此类或CTE解决方案),您将拥有一个多列索引,如:
CREATE INDEX foo_idx ON work (user_id,arrived_at);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.