简体   繁体   中英

Hourly report for data warehouse

I have the following tables in my Postgresql 9.1 database

 SELECT * from hour_dimension limit 10;
 id |    date    | hour
 - -+------------+------
 1  | 2013-01-01 |    5
 2  | 2013-01-01 |    6
 3  | 2013-01-01 |    7
 4  | 2013-01-01 |    8
 5  | 2013-01-01 |    9
 6  | 2013-01-01 |   10
 7  | 2013-01-01 |   11
 8  | 2013-01-01 |   12
 9  | 2013-01-01 |   13
10  | 2013-01-01 |   14



SELECT 

shop_id,
trans_date_time::date as date,
extract(hour from trans_date_time) as hour,
round(amount_in_cents/100.1,2) as amount

FROM transaction 
LIMIT 10;

shop_id |    date    | hour | amount
--------+------------+------+--------
 2877   | 2013-01-02 |    9 |   3.50
 2877   | 2013-01-02 |   10 |   4.00
 2877   | 2013-01-02 |   14 |   4.00
 2877   | 2013-01-03 |   11 |   1.40
 2877   | 2013-01-03 |   11 |   4.50
 2877   | 2013-01-03 |   12 |   3.00
 2877   | 2013-01-03 |   13 |   2.00
 2877   | 2013-01-03 |   13 |   2.00
 2877   | 2013-01-03 |   14 |   1.00
 2877   | 2013-01-04 |    9 |   4.00


 SELECT id  from shop limit 3;
 id
 ------
 2877
 2878
 2879

I am trying to write a data-warehouse type query so I can generate (and store) a daily report describing how each shop has performed on an hourly basis, similar to the following:

   date    | hour | shop_id | amount
-----------+------+----------+--------
2013-01-01 |    5 |     2877 |   0.00
2013-01-01 |    6 |     2877 |   0.00
2013-01-01 |    7 |     2877 |   0.00
2013-01-01 |    8 |     2877 |   0.00
2013-01-01 |    9 |     2877 |   3.50
2013-01-01 |   10 |     2877 |   4.00
2013-01-01 |   11 |     2877 |   5.90
2013-01-01 |   12 |     2877 |   3.00
2013-01-01 |   13 |     2877 |   4.00
2013-01-01 |   14 |     2877 |   1.00

SAMPLE QUERY:

SELECT hd.date as date, hd.hour as hour, 

shop_id,

round(sum(case when amount is null then 0 else amount end),2) as amount 

FROM (

    SELECT 

    shop_id,
    trans_date_time::date as date,
    extract(hour from trans_date_time) as hour,
    amount_in_cents/100.0 as amount
    FROM
    transaction

) x

RIGHT JOIN hour_dimension hd ON (hd.date = x.date AND hd.hour = x.hour)

AND shop_id = 2877
where hd.date = '2013-01-10'

GROUP BY hd.date, hd.hour, shop_id
ORDER by hd.date, hd.hour
LIMIT 10;
select 
    shop_id,
    trans_date_time::date as date,
    extract(hour from trans_date_time) as hour,
    round(sum(coalesce(amount_in_cents, 0))/100.0, 2) as amount
from transaction
group by 1, 2, 3
order by 1, 2, 3

You'll probably get better performance if you can select shop id numbers from a table of shops. I just used a SELECT DISTINCT subquery. The cross join gives you every combination of date, hour, and shop_id.

with shop_hours as (
  select hd."date", hd."hour", tr.shop_id
  from hour_dimension hd
  cross join (select distinct shop_id from transaction) tr
)
select sh."date"::date, sh."hour", sh.shop_id, coalesce(sum(tr.amount), 0)
from shop_hours sh
left join transaction tr
       on tr.trans_date_time::date = sh."date"
      and tr.hour = sh."hour"
      and tr.shop_id = sh.shop_id
group by sh."date", sh."hour", sh.shop_id
order by sh.shop_id, sh."date", sh."hour"

Please, try the following query:

SELECT hd."date", hd.hour,
       s.shop_id,
       sum(coalesce(round(t.amount_in_cents/100.1,2),0)) amount
  FROM hour_dimension hd
  CROSS JOIN (SELECT DISTINCT shop_id FROM transaction) s
  LEFT JOIN transaction t
    ON hd."date"=t.trans_date_time::date
   AND hd.hour=extract(hour from t.trans_date_time)
 GROUP BY 1,2,3
 ORDER BY 1,2,3;

Also on SQL Fiddle .

Note, that using date as column names/aliases is not good, 'cos it is a reserved keyword . You should always double-quote it, but better avoid it as column names.

hour is not reserved for PostgreSQL, although SQL Standard has it reserved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM