简体   繁体   中英

consecutive days in sql

I found many stackoverflow QnAs about consecutive days.
Still answers are too short for me to understand what's going on.

For concreteness, I'll make up a model (or a table)
(I'm using postgresql if it makes a difference.)

CREATE TABLE work (
    id integer NOT NULL,
    user_id integer NOT NULL,
    arrived_at timestamp with time zone NOT NULL
);


insert into work(user_id, arrived_at) values(1, '01/03/2011');
insert into work(user_id, arrived_at) values(1, '01/04/2011');
  1. (In simplest form) For a given user, I want to find the last-consecutive date range.

  2. (My ultimate goal) For a given user, I want to find his consecutive working days.
    If he came to work yesterday, he still(as of today) has chance of working consecutive days. So I show him consecutive days upto yesterday.
    But if he missed yesterday, his consecutive days is either 0 or 1 depending on whether he came today or not.

Say today is 8th day.

3 * 5 6 7 * = 3 days (5 to 7)
3 * 5 6 7 8 = 4 days (5 to 8)
3 4 5 * 7 * = 1 day (7 to 7)
3 * * * * * = 0 day 
3 * * * * 8 = 1 day (8 to 8)

Here is my solution to this problem using CTE

WITH RECURSIVE CTE(attendanceDate)
AS
(
   SELECT * FROM 
   (
      SELECT attendanceDate FROM attendance WHERE attendanceDate = current_date 
      OR attendanceDate = current_date - INTERVAL '1 day' 
      ORDER BY attendanceDate DESC
      LIMIT 1
   ) tab
   UNION ALL

   SELECT a.attendanceDate  FROM attendance a
   INNER JOIN CTE c
   ON a.attendanceDate = c.attendanceDate - INTERVAL '1 day'
) 
SELECT COUNT(*) FROM CTE;

Check the code at SQL Fiddle

Here is how the query is working:

  1. It selects today's record from attendance table. If today's record is not available then it selects yesterday's record
  2. It then keeps adding recursively record a day before the least date

If you want to select latest consecutive date range irrespective of when was user's latest attendance(today, yesterday or x days before), then the initialization part of CTE must be replaced by below snippet:

SELECT MAX(attendanceDate) FROM attendance

[EDIT] Here is query at SQL Fiddle which resolves your question#1: SQL Fiddle

-- some data
CREATE table dayworked (
        id SERIAL NOT NULL PRIMARY KEY
        , user_id INTEGER NOT NULL
        ,  arrived_at DATE NOT NULL
        , UNIQUE (user_id,  arrived_at)
        );

INSERT INTO dayworked(user_id, arrived_at) VALUES
 ( 1, '2014-02-03')
,( 1, '2014-02-05')
,( 1, '2014-02-06')
,( 1, '2014-02-07')
        --
,( 2, '2014-02-03')
,( 2, '2014-02-05')
,( 2, '2014-02-06')
,( 2, '2014-02-07')
,( 2, '2014-02-08')
        --
,( 3, '2014-02-03')
,( 3, '2014-02-04')
,( 3, '2014-02-05')
,( 3, '2014-02-07')
        --
,( 5, '2014-02-08')
        ;

-- The query
WITH RECURSIVE stretch AS (
        SELECT dw.user_id AS user_id
                , dw.arrived_at AS first_day
                , dw.arrived_at AS last_day
                , 1::INTEGER AS nday
        FROM dayworked dw
        WHERE NOT EXISTS ( -- Find start of chain: no previous day
                SELECT * FROM dayworked nx
                WHERE nx.user_id = dw.user_id
                AND nx. arrived_at = dw.arrived_at -1
                )
        UNION ALL
        SELECT dw.user_id AS user_id
                , st.first_day AS first_day
                , dw.arrived_at AS last_day
                , 1+st.nday AS nday
        FROM dayworked dw -- connect to chain: previous day := day before this day
        JOIN stretch st ON st.user_id = dw.user_id AND st.last_day = dw.arrived_at -1
        )
SELECT * FROM stretch st
WHERE (st.nday > 1 OR st.first_day = NOW()::date ) -- either more than one consecutive dat or starting today
AND NOT EXISTS ( -- Only the most recent stretch
        SELECT * FROM stretch nx
        WHERE nx.user_id = st .user_id
        AND nx.first_day > st.first_day
        )
AND NOT EXISTS ( -- omit partial chains
        SELECT * FROM stretch nx
        WHERE nx.user_id = st .user_id
        AND nx.first_day = st.first_day
        AND nx.last_day > st.last_day
        )
        ;

Result:

CREATE TABLE
INSERT 0 14
 user_id | first_day  |  last_day  | nday 
---------+------------+------------+------
       1 | 2014-02-05 | 2014-02-07 |    3
       2 | 2014-02-05 | 2014-02-08 |    4
(2 rows)

You can create an aggregate with the range types:

Create function sfunc (tstzrange, timestamptz)
    returns tstzrange
    language sql strict as $$
        select case when $2 - upper($1) <= '1 day'::interval
                then tstzrange(lower($1), $2, '[]')
                else tstzrange($2, $2, '[]') end
    $$;

Create aggregate consecutive (timestamptz) (
        sfunc = sfunc,
        stype = tstzrange,
        initcond = '[,]'
);

Use the aggregate with the right order the get the consecutive day range for the last arrived_at:

Select user_id, consecutive(arrived_at order by arrived_at)
    from work
    group by user_id;

    ┌─────────┬─────────────────────────────────────────────────────┐
    │ user_id │                     consecutive                     │
    ├─────────┼─────────────────────────────────────────────────────┤
    │       1 │ ["2011-01-03 00:00:00+02","2011-01-05 00:00:00+02"] │
    │       2 │ ["2011-01-06 00:00:00+02","2011-01-06 00:00:00+02"] │
    └─────────┴─────────────────────────────────────────────────────┘

Use the aggregate in a window function:

Select *,
        consecutive(arrived_at)
                over (partition by user_id order by arrived_at)
    from work;

    ┌────┬─────────┬────────────────────────┬─────────────────────────────────────────────────────┐
    │ id │ user_id │       arrived_at       │                     consecutive                     │
    ├────┼─────────┼────────────────────────┼─────────────────────────────────────────────────────┤
    │  1 │       1 │ 2011-01-03 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-03 00:00:00+02"] │
    │  2 │       1 │ 2011-01-04 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-04 00:00:00+02"] │
    │  3 │       1 │ 2011-01-05 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-05 00:00:00+02"] │
    │  4 │       2 │ 2011-01-06 00:00:00+02 │ ["2011-01-06 00:00:00+02","2011-01-06 00:00:00+02"] │
    └────┴─────────┴────────────────────────┴─────────────────────────────────────────────────────┘

Query the results to find what you need:

With work_detail as (select *,
            consecutive(arrived_at)
                    over (partition by user_id order by arrived_at)
        from work)
    select arrived_at, upper(consecutive) - lower(consecutive) as days
        from work_detail
            where user_id = 1 and upper(consecutive) != lower(consecutive)
            order by arrived_at desc
                limit 1;

    ┌────────────────────────┬────────┐
    │       arrived_at       │  days  │
    ├────────────────────────┼────────┤
    │ 2011-01-05 00:00:00+02 │ 2 days │
    └────────────────────────┴────────┘

You can even do this without a recursive CTE:
with generate_series() , LEFT JOIN , row_count() and a final LIMIT 1 :

1 for "today" plus consecutive days up until "yesterday":

SELECT count(*)   -- 1 / 0  for "today"
     + COALESCE(( -- + optional count of consecutive days up until "yesterday"
       SELECT ct
       FROM  (
          SELECT d.ct, count(w.arrived_at) OVER (ORDER BY d.ct) AS day_ct
          FROM   generate_series(1, 8) AS d(ct)   -- maximum = 8
          LEFT   JOIN work w ON  w.arrived_at >= current_date -  d.ct
                             AND w.arrived_at <  current_date - (d.ct - 1)
                             AND w.user_id = 1    -- given user
          ) sub
       WHERE  ct = day_ct
       ORDER  BY ct DESC
       LIMIT  1
       ), 0) AS total
FROM   work
WHERE  arrived_at >= current_date  -- no future timestamps
AND    user_id = 1                 -- given user

Assuming 0 or 1 entry per day. Should be fast.

For best performance (for this or the CTE solution alike) you would have a multicolumn index like:

CREATE INDEX foo_idx ON work (user_id,arrived_at);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM