简体   繁体   中英

Postgres- group each n days

Can i group each n days in postgres

If we have:

date       |  price
2018-01-01 |  10
2018-01-02 |  11
2018-01-03 |  10.5
.....

something like grouping each 10 days and get the avg of price column

How about this? It buckets consecutive 10 day periods regardless of gaps.

CREATE TABLE date (
  date  DATE             NOT NULL,
  price DOUBLE PRECISION NOT NULL
);

INSERT INTO date (date, price)
SELECT (now()::DATE) + s.i,
  s.i :: DOUBLE PRECISION
FROM generate_series(0, 1000) AS s(i);

SELECT ((extract(EPOCH FROM date) / (60 * 60 * 24)) :: BIGINT) / 10
    , avg(price) AS average_price
FROM date
GROUP BY 1
ORDER BY 1;
example data:
ugautil=> select id,date, price from sales order by 1 limit 30;
id  |    date    | price  
-----+------------+--------
569 | 2018-01-01 | 296.01
570 | 2018-01-02 | 409.50
571 | 2018-01-03 |  46.73
572 | 2018-01-04 | 302.13
573 | 2018-01-05 | 313.83
574 | 2018-01-06 | 302.68
575 | 2018-01-07 | 359.53
576 | 2018-01-08 | 348.60
577 | 2018-01-09 | 376.09
578 | 2018-01-10 |  23.71
579 | 2018-01-11 | 470.93
580 | 2018-01-12 | 409.37
581 | 2018-01-13 | 160.95
582 | 2018-01-14 |  22.04
583 | 2018-01-15 | 295.15
584 | 2018-01-16 | 475.42
585 | 2018-01-17 | 399.37
586 | 2018-01-18 | 394.43
587 | 2018-01-19 |  91.97
588 | 2018-01-20 |  27.38
589 | 2018-01-21 | 286.23
590 | 2018-01-22 |  57.81
591 | 2018-01-23 | 486.14
592 | 2018-01-24 |  10.30
593 | 2018-01-25 | 423.67
594 | 2018-01-26 | 169.94
595 | 2018-01-27 | 152.08
596 | 2018-01-28 | 344.42
597 | 2018-01-29 | 448.63
598 | 2018-01-30 | 360.33
(30 rows)

Picking Jan 1, 2018 as start date. Every 10 days gives us an index number. 
only looking at first 3 groups in Jan
ugautil=> select floor((extract(epoch from date) - extract(epoch from date('2018-01-01')))/86400/10) as "ten_day_index", round(avg(price),2) from sales group by 1 order by 1 limit 3;
ten_day_index | round  
---------------+--------
           0 | 277.88
           1 | 274.70
           2 | 273.96
(3 rows)

ugautil=> delete from sales where id >= 569 and id <= 576;
DELETE 8
ugautil=> select id,date, price from sales order by 1 limit 30;
id  |    date    | price  
-----+------------+--------
577 | 2018-01-09 | 376.09
578 | 2018-01-10 |  23.71
579 | 2018-01-11 | 470.93
580 | 2018-01-12 | 409.37
581 | 2018-01-13 | 160.95
582 | 2018-01-14 |  22.04
583 | 2018-01-15 | 295.15
584 | 2018-01-16 | 475.42
585 | 2018-01-17 | 399.37
586 | 2018-01-18 | 394.43
587 | 2018-01-19 |  91.97
588 | 2018-01-20 |  27.38
589 | 2018-01-21 | 286.23
590 | 2018-01-22 |  57.81
591 | 2018-01-23 | 486.14
592 | 2018-01-24 |  10.30
593 | 2018-01-25 | 423.67
594 | 2018-01-26 | 169.94
595 | 2018-01-27 | 152.08
596 | 2018-01-28 | 344.42
597 | 2018-01-29 | 448.63
598 | 2018-01-30 | 360.33
599 | 2018-01-31 | 120.00
600 | 2018-02-01 | 328.08
601 | 2018-02-02 | 393.58
602 | 2018-02-03 |  52.04
603 | 2018-02-04 | 206.91
604 | 2018-02-05 | 194.20
605 | 2018-02-06 | 102.89
606 | 2018-02-07 | 146.78
(30 rows)

ugautil=> select floor((extract(epoch from date) - extract(epoch from date('2018-01-01')))/86400/10) as "ten_day_index", round(avg(price),2) from sales group by 1 order by 1 limit 3;
ten_day_index | round  
---------------+--------
           0 | 199.90
           1 | 274.70
           2 | 273.96
(3 rows)

only the Jan 9 and 10 entries are in the first average in group 0

This is pretty brute force, and I'll bet a dollar it's wickedly inefficient due to the > and < in the join, but it notionally sounds like what you want to do:

with intervals as (
  select start_date, start_date + interval '10 days' as end_date
  from generate_series (
    (select min (date) from price_data),
    (select max (date) from price_data),
    interval '10 days') gs (start_date)
)
select
  i.start_date, sum (p.price) / 10 as average
from
  price_data p
  join intervals i on
    p.date >= i.start_date and
    p.date <  i.end_date
group by
  i.start_date

This looks a whole lot uglier, but I suspect on large datasets it would run a lot faster:

with intervals as (
  select
    start_date::date as start_date,
   (start_date + interval '10 days')::date as end_date
  from generate_series (
    (select min (date) from price_data),
    (select max (date) from price_data),
    interval '10 days') gs (start_date)
),
exploded_intervals as (
  select 
    start_date + i as snapshot_date, start_date, end_date
  from
    intervals i
    cross join generate_series (0, 9) gs (i)
)
select
  i.start_date, sum (p.price) / 10 as average
from
  price_data p
  join exploded_intervals i on
    p.date = i.snapshot_date
group by
  i.start_date

I don't promise these are the best way to do it, but they are ways to do it.

In a nutshell, I am taking the minimum and maximum dates in your dataset and chunking them into 10-day intervals. This does presuppose that the clock on "every 10 days" starts on your first date.

From there, I am grouping the actual data into each of these date buckets, summing the prices and dividing by 10. If you have missing dates, it should account for that. If you have duplicates on the same day... well, then it's going to artifically inflate your "average." If you define a rule on how to deal with dupes, that is easy enough to manage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM