[英]Turning Records by Date Range into Records by Day/Month Using SQL Server or Vertica
I can use either SQL Server or Vertica as the DB and Tableau as the reporting tool. 我可以使用SQL Server或Vertica作为DB和Tableau作为报告工具。 A solution in any of these mediums would be helpful.
任何这些媒介的解决方案都会有所帮助。
DATA RESOURCES: I have a table (userActivity) with 100 records and a structure of: User, StartDate, EndDate 数据资源:我有一个包含100条记录的表(userActivity),结构为:User,StartDate,EndDate
NEED: I am interested in preparing reports by day and month that show "total active days", meaning if User1 has a range of '20180101' to '20180331', they will contribute one day for each day in Jan, Feb and Mar OR 31, 28 and 31 days if aggregated by month. 需要:我有兴趣按日和月编制报告,显示“总活跃天数”,这意味着如果User1的范围为'20180101'到'20180331',他们将在1月,2月和3月每天贡献一天或者如果按月汇总,则为31天,28天和31天。
GOAL: I will ultimately be aggregating the total active days of all users as the output to achieve a single total for each day/month. 目标:我最终会将所有用户的总活动天数作为输出汇总,以实现每天/每月的单一总计。
This report will span to perpetuity, so I would prefer solutions that don't hard code CASE/IF-THEN statements by day/month. 这个报告将延续到永久性,所以我更喜欢那些没有按日/月硬编码CASE / IF-THEN语句的解决方案。
Thanks! 谢谢!
While recursive CTEs are a good candidate for this scenario, it can be handled with tableau alone. 虽然递归CTE是这种情况的一个很好的候选者,但它可以单独使用tableau来处理。 Assuming you have this data, here are the steps required to produce the view.
假设您拥有此数据,以下是生成视图所需的步骤。
You need two columns with exact same date as Tableau does not allow multiple join conditions on same column. 您需要两个具有完全相同日期的列,因为Tableau不允许在同一列上存在多个连接条件。
Use Vertica - it has the TIMESERIES clause - no recursion needed. 使用Vertica - 它具有TIMESERIES子句 - 不需要递归。
I would try the below - and check the intermediate results of the Common Table Expressions to see how it works.. 我会尝试以下 - 并检查公用表表达式的中间结果,看看它是如何工作的..
WITH
-- two test rows ....
input(uid,start_dt,end_dt) AS (
SELECT 1,DATE '2018-01-01', DATE '2018-03-31'
UNION ALL SELECT 2,DATE '2018-02-01', DATE '2018-04-01'
)
,
-- set the stage for Vertica's TIMESERIES clause
-- note: TIMESERIES relies on timestamps ...
limits(uid,lim_dt,qty) AS (
SELECT
uid
, start_dt::TIMESTAMP
, 1
FROM input
UNION ALL
SELECT
uid
, end_dt::TIMESTAMP
, 1
FROM input
)
,
-- apply the Vertica TIMESERIES clause
counters AS (
SELECT
uid
, act_dt
, TS_FIRST_VALUE(qty) AS qty
FROM limits
TIMESERIES act_dt AS '1 DAY' OVER(PARTITION BY uid ORDER BY lim_dt)
)
SELECT
uid
, MONTH(act_dt) AS activity_month
, SUM(qty)
FROM counters
GROUP BY 1,2;
-- out uid | activity_month | sum
-- out -----+----------------+-----
-- out 1 | 1 | 31
-- out 1 | 2 | 28
-- out 1 | 3 | 31
-- out 2 | 2 | 28
-- out 2 | 3 | 31
-- out 2 | 4 | 1
-- out (6 rows)
-- out
-- out time: first fetch (6 rows): 120.515 ms. all rows formatted: 120.627 ms
Solution: 解:
WITH base AS (
SELECT
User AS u
,StartDate AS s
,EndDate AS e
,DATEDIFF(
dd,
StartDate,
EndDate
)+1 AS d
FROM userActivity
),
recurse AS (
SELECT u, s, e, d, x=(d-1)
FROM base
UNION ALL
SELECT u, s, e, d, x-1 AS x
FROM recurse
WHERE x>0
)
SELECT u, DATEADD(dd, x, s) AS recordperday
FROM recurse
ORDER BY u, recordperday
--Extends SQL Server's recursion limit
OPTION (MAXRECURSION 500)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.