简体   繁体   中英

Group by hourly interval

I'm new to SQL and I have problems when trying to make an hourly report on a database that supports HiveSQL.

Here's my dataset

|NAME| CHECKIN_HOUR |CHECKOUT_HOUR|
|----|--------------|-------------|
| A  |       00     |      00     | 
| B  |       00     |      01     | 
| C  |       00     |      02     |
| D  |       00     |      null   |
| E  |       01     |      02     |
| F  |       01     |      null   |

And I would like to get an hourly summary report that looks like this:

|TIME| CHECKIN_NUMBER |CHECKOUT_NUMBER|STAY_NUMBER|
|----|----------------|---------------|-----------|
| 00 |        4       |       1       |     3     |
| 01 |        2       |       1       |     4     | 
| 02 |        0       |       2       |     2     |

stay_number means counting the number of people that haven't checked out by the end of that hour, eg 2 at the last row means that by the end of 2am, there're two people (D and F) haven't checked out yet. So basically I'm trying to get a summarize check-in, check-out and stay report for each hour.

I've no idea how to compute an hourly interval table since simply grouping by check_in or check_out hour doesn't get the expected result. All the date field is originally in Unix timestamp data type, so feel free to use date functions on it.

Any instructions and help would be greatly appreciated, thanks!

Here is one method that unpivots the data and uses cumulative sums:

select hh, 
       sum(ins) as checkins, sum(outs) as checkouts,
       sum(sum(ins)) over (order by hh) - sum(sum(outs)) over (order by hh)
from ((select checkin_hour as hh, count(*) as ins, 0 as outs
       from t
       group by checkin_hour
      ) union all
      (select checkout_hour, 0 as ins, count(*) as outs
       from t
       where checkout_hour is not null
       group by checkout_hour
      )
     ) c
group by hh
order by hh;

The idea is to count the number of checks in and check outs in each hour and then accumulate the totals for each hour. The difference is the number of says.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM