简体   繁体   中英

Turning Records by Date Range into Records by Day/Month Using SQL Server or Vertica

I can use either SQL Server or Vertica as the DB and Tableau as the reporting tool. A solution in any of these mediums would be helpful.

DATA RESOURCES: I have a table (userActivity) with 100 records and a structure of: User, StartDate, EndDate

NEED: I am interested in preparing reports by day and month that show "total active days", meaning if User1 has a range of '20180101' to '20180331', they will contribute one day for each day in Jan, Feb and Mar OR 31, 28 and 31 days if aggregated by month.

GOAL: I will ultimately be aggregating the total active days of all users as the output to achieve a single total for each day/month.

This report will span to perpetuity, so I would prefer solutions that don't hard code CASE/IF-THEN statements by day/month.

Thanks!

While recursive CTEs are a good candidate for this scenario, it can be handled with tableau alone. Assuming you have this data, here are the steps required to produce the view.

在此输入图像描述

  1. Create a reference sheet which has all the days expected. Even if you need to cover 25 years from 01/01/2018 to 01/01/2043, that is still less than 10k rows.

在此输入图像描述

You need two columns with exact same date as Tableau does not allow multiple join conditions on same column.

  1. Create an inner join between reference calendar and data using following criteria. 在此输入图像描述

  2. Build the view

    在此输入图像描述

Use Vertica - it has the TIMESERIES clause - no recursion needed.

I would try the below - and check the intermediate results of the Common Table Expressions to see how it works..

WITH 
-- two test rows ....
input(uid,start_dt,end_dt) AS (
            SELECT 1,DATE '2018-01-01', DATE '2018-03-31'
  UNION ALL SELECT 2,DATE '2018-02-01', DATE '2018-04-01'
)
,
-- set the stage for Vertica's TIMESERIES clause
-- note: TIMESERIES relies on timestamps ...
limits(uid,lim_dt,qty) AS (
  SELECT
    uid
  , start_dt::TIMESTAMP
  , 1
  FROM input
  UNION ALL
  SELECT
    uid
  , end_dt::TIMESTAMP
  , 1
  FROM input
)
,
-- apply the Vertica TIMESERIES clause
counters AS (
  SELECT
    uid
  , act_dt
  , TS_FIRST_VALUE(qty) AS qty
  FROM limits
  TIMESERIES act_dt AS '1 DAY' OVER(PARTITION BY uid ORDER BY lim_dt)
)
SELECT
  uid
, MONTH(act_dt) AS activity_month
, SUM(qty)
FROM counters
GROUP BY 1,2;
-- out  uid | activity_month | sum 
-- out -----+----------------+-----
-- out    1 |              1 |  31
-- out    1 |              2 |  28
-- out    1 |              3 |  31
-- out    2 |              2 |  28
-- out    2 |              3 |  31
-- out    2 |              4 |   1
-- out (6 rows)
-- out 
-- out time: first fetch (6 rows): 120.515 ms. all rows formatted: 120.627 ms

Solution:

WITH base AS (
  SELECT
     User       AS u
    ,StartDate  AS s
    ,EndDate    AS e
    ,DATEDIFF(
      dd,
      StartDate,
      EndDate
      )+1       AS d
  FROM  userActivity
  ),
recurse AS (
  SELECT    u, s, e, d, x=(d-1)
    FROM    base
    UNION ALL
    SELECT  u, s, e, d, x-1 AS x
    FROM    recurse
    WHERE   x>0
  )
SELECT      u, DATEADD(dd, x, s) AS recordperday
FROM        recurse
ORDER BY    u, recordperday
--Extends SQL Server's recursion limit
OPTION (MAXRECURSION 500)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM