简体   繁体   中英

SQL count occurrences in window

I have user logins by date. My requirement is to track the number of users that have been logged in during the past 90 days window.

I am new to both SQL in general and Teradata specifically and I can't get the window functionality to work as I need.

I need the following result, where ACTIVE is a count of the unique USER_IDs that appear in the previous 90 day window the DATE.

DATES        ACTIVE_IN_WINDOW
12/06/2018     20
13/06/2018     45                 
14/06/2018     65 
15/06/2018     73 
17/06/2018     24      
18/06/2018     87  
19/06/2018     34
20/06/2018     51

Currently my script is as follows.

It is this line here that I cant get right

COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND  0 FOLLOWING)

I suspect I need a different set of functions to make this work.

SELECT    b.DATES , a.ACTIVE_IN_WINDOW

FROM    

(
        SELECT 

        CAST(CALENDAR_DATE AS DATE) AS DATES FROM SYS_CALENDAR.CALENDAR

        WHERE DATES BETWEEN ADD_MONTHS(CURRENT_DATE, - 10)  AND CURRENT_DATE
) b

LEFT JOIN

(
        SELECT    USER_ID   , EVT_DT 

        , COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND  0 FOLLOWING) AS ACTIVE_IN_WINDOW

        FROM ENV0.R_ONBOARDING
) a

ON a.EVT_DT = b.DATES

ORDER BY b.DATES

Thank you for any assistance.

The logic is similar to Gordon', but a non-equi-Join instead of a Correlated Scalar Subquery is usually more efficient on Teradata:

SELECT b.DATES , Count(DISTINCT USER_ID)
FROM
 (
   SELECT CALENDAR_DATE AS DATES 
   FROM SYS_CALENDAR.CALENDAR
   WHERE DATES BETWEEN Add_Months(Current_Date, - 10)  AND Current_Date
 ) b
LEFT JOIN
 ( -- apply DISTINCT before aggregation to reduce intermediate spool
   SELECT DISTINCT USER_ID, EVT_DT
   FROM ENV0.R_ONBOARDING
 ) AS a
ON a.EVT_DT BETWEEN Add_Months(b.DATES,-3) AND b.DATES
GROUP BY 1
ORDER BY 1

Of course this will require a large spool and much CPU.

Edit:

Switching to weeks reduces the overhead, I'm using dates instead of week numbers (it's easier to modify for other ranges):

SELECT b.Week , Count(DISTINCT USER_ID) 
FROM
 ( -- Return only Mondays instead of DISTINCT over all days 
   SELECT calendar_date AS Week
   FROM SYS_CALENDAR.CALENDAR 
   WHERE CALENDAR_DATE BETWEEN Add_Months(Current_Date, -9) AND Current_Date
     AND day_of_week = 2 -- 2 = Monday
 ) b 
LEFT JOIN 
 (
   SELECT DISTINCT USER_ID,
     -- td_monday returns the previous Monday, but we need the following monday
     -- covers the previous Tuesday up to the current Monday
            Td_Monday(EVT_DT+6) AS PERIOD_WEEK
   FROM ENV0.R_ONBOARDING
   -- You should add another condition to limit the actually covered date range, e.g.
   -- where EVT_DT BETWEEN Add_Months(b.DATES,-13) AND b.DATES
 ) AS a 
ON a.PERIOD_WEEK BETWEEN b.Week-(12*7) AND b.Week 
GROUP BY 1 
ORDER BY 1 

Explain should duplicate the calendar as preparation for the product join, if not you might need to materialize the dates in a Volatile Table. Better don't use sys_calendar , there are no statistics, eg optimizer doesn't know about how many days per week/month/year, etc. Check your system, there should be a calendar table designed for you company needs (with stats on all columns)

If your data is not too big, a subquery might be the simplest method:

SELECT c.dte,
       (SELECT COUNT(DISTINCT o.USER_ID)
        FROM ENV0.R_ONBOARDING o
        WHERE o.EVT_DT > ADD_MONTHS(dte, -3) AND
              o.EVT_DT <= dte
       ) as three_month_count
FROM (SELECT CAST(CALENDAR_DATE AS DATE) AS dte
      FROM SYS_CALENDAR.CALENDAR
      WHERE CALENDAR_DATE BETWEEN ADD_MONTHS(CURRENT_DATE, - 10)  AND CURRENT_DATE
     ) c;

You might want to start on a shorter timeframe then 3 months to see how the query performs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM