简体   繁体   中英

SQL question: count of occurrence greater than N in any given hour

I'm looking through login logs (in Netezza) and trying to find users who have greater than a certain number of logins in any 1 hour time period (any consecutive 60 minute period, as opposed to strictly a clock hour) since December 1st. I've viewed the following posts, but most seem to address searching within a specific time range, not ANY given time period. Thanks. https://dba.stackexchange.com/questions/137660/counting-number-of-occurences-in-a-time-period https://dba.stackexchange.com/questions/67881/calculating-the-maximum-seen-so-far-for-each-point-in-time Count records per hour within a time span

You could use the analytic function lag to look back in a sorted sequence of time stamps to see whether the record that came 19 entries earlier is within an hour difference:

with cte as (
    select   user_id,
             login_time,
             lag(login_time, 19) over (partition by user_id order by login_time) as lag_time
    from     userlog
    order by user_id,
             login_time
)
select   user_id,
         min(login_time) as login_time
from     cte
where    extract(epoch from (login_time - lag_time)) < 3600
group by user_id

The output will show the matching users with the first occurrence when they logged a twentieth time within an hour.

I think you might do something like that (I'll use a login table, with user, datetime as single column for the sake of simplicity):

with connections as (
  select ua.user
       , ua.datetime
  from user_logons  ua
  where ua.datetime >= timestamp'2018-12-01 00:00:00'
)
  select ua.user
       , ua.datetime
       , (select count(*) 
          from connections  ut 
          where ut.user = ua.user 
            and ut.datetime between ua.datetime and (ua.datetime + 1 hour)
         ) as consecutive_logons
       from connections ua
  1. It is up to you to complete with your columns (user, datetime)
  2. It is up to you to find the dateadd facilities (ua.datetime + 1 hour won't work); this is more or less dependent on the DB implementation, for example it is DATE_ADD in mySQL ( https://www.w3schools.com/SQl/func_mysql_date_add.asp )
  3. Due to the subquery ( select count(*) ... ), the whole query will not be the fastest because it is a corelative subquery - it needs to be reevaluated for each row.
  4. The with is simply to compute a subset of user_logons to minimize its cost. This might not be useful, however this will lessen the complexity of the query.

You might have better performance using a stored function or a language driven (eg: java, php, ...) function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM