I'm starting with a table like so:
ENTITY_ID | META_ATTRIB_1 | META_ATTRIB_2 | START_DATE | END_DATE |
---|---|---|---|---|
1 | FOO | BAR | 2020-01-01 | 2020-12-01 |
I'm would like to end up with a count of entities per day that fall within given sets of meta-attributes:
DAY | META_ATTRIB_1 | META_ATTRIB_2 | COUNT |
---|---|---|---|
2020-01-01 | FOO | BAR | 1 |
2020-01-02 | FOO | BAR | 1 |
2020-01-03 | FOO | BAR | 1 |
Right now I'm doing this by generating a sequence of dates from DUAL
and joining the target table in via DAY BETWEEN START_DATE AND END_DATE
and grouping by DAY, META_ATTRIB_1, META_ATTRIB_2
.
This method is running into performance problems. Is there a better method for splitting out each of these entity rows across the desired sequence of days and then aggregating it back for a by day count?
A typical approach uses a recursive to generate one row per day in each range, then aggregation:
with cte (meta_attrib_1, meta_attrib_2, dt, end_date) as (
select meta_attrib_1, meta_attrib_2, start_date, end_date from mytable
union all
select meta_attrib_1, meta_attrib_2, dt + 1, end_date from cte where dt < end_date
)
select dt, meta_attrib_1, meta_attrib_2, count(*) as cnt
from cte
group by dt, meta_attrib_1, meta_attrib_2
This is pretty close to the logic that you described. You did not show your actual query so it is hard to tell whether this is a better solution than what you are doing currently.
You might find that a recursive CTE is faster:
with cte (day, meta_attrib1, meta_attrib2, end_date)
select start_date, meta_attrib1, meta_attrib2, end_date
from t
union all
select start_date + interval '1' day, meta_attrib1, meta_attrib2, end_date
from cte
where day < end_date
)
select day, meta_attrib1, meta_attrib2, count(*)
from cte
group by day, meta_attrib1, meta_attrib2;
The advantage to a recursive CTE is that it "localizes" the expansion of the dates. Instead of relying on a non-equijoin, it simply churns out the additional dates.
This is still producing a separate row for each day for each of the original rows. That means that the aggregation could be the bottleneck.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.