简体   繁体   中英

Distinct count on a rolling time window

I want to count the number of distinct catalog numbers that have appeared within the last X minutes. This is usually called a rolling time window.

For instance, if I have:

row        startime            orderNumber    catalogNumb
1        2007-09-24-15.50       o1              21    
2        2007-09-24-15.51       o2              21
3        2007-09-24-15.52       o2              21
4        2007-09-24-15.53       o3              21
5        2007-09-24-15.54       o4              22
6        2007-09-24-15.55       o4              23
7        2007-09-24-15.56       o4              21
8        2007-09-24-15.57       o4              21

For instance, if I want to get this for the last 5 minutes (5 is just one of the possible values), the output should be:

row        startime            orderNumber    catalogNumb    countCatalog
1        2007-09-24-15.50       o1              21                 1
2        2007-09-24-15.51       o2              22                 2
3        2007-09-24-15.52       o2              23                 3
4        2007-09-24-15.53       o3              24                 4
5        2007-09-24-15.54       o4              21                 4
6        2007-09-24-15.55       o4              21                 4 
7        2007-09-24-15.56       o4              21                 4
8        2007-09-24-15.57       o4              21                 3

I am using Big SQL for infosphere BigInsights v3.0. Resulting query can use any db2 Olap windows functions except for count (distinct catalogNumb) OVER()... which is not supported by my db2 version.

In addition to count, I may also need to use other aggregate functions (avg, sum...) over the catalogNumb and other attributes.

Any feedback would be appreciated.

True Db2 does not support count distinct as OLAP function but there is an easy workaround:

You can use

dense_rank

instead - the highest number (max) from dense rank is your count distinct!

You can try something like this:

select ...
  from mytable
  where starttime between current_time - 5 minutes and current_time

That will get all the rows for the last 5 minutes. 5 can be a variable. then count() or sum() or average() the rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM