简体   繁体   中英

SQL - Can I make use of a partition when checking the partitioned field against value from another table?

I'm querying in Athena SQL for the following use case:

I have a table A which is partitioned on Date: Date | Number of Purchases | Category

In another table B, I have 500 events which happened on particular dates. I want to access aggregated data from A for the week before each of these events: EventID | Event_Date | 7_Days_Before_Event_Date | Category

I would like to end up with, for each event, the sum of purchases for the 7 days before the date the event occurred.

However, when using a where clause for this eg. A.Date between B.7_Days_Before_Event_Date and B.Event_Date the partition on A is no longer used, and all data is queried, vastly reducing performance.

How might I get the data for the week before each event while using the partition and therefore keeping performance high?

SQL Query:

select b.event_id, sum(a.number_of_purchases)
from dbo.tableA a
inner join dbo.tableB b on a.category = b.category
where a.date between b.7_days_before_event_date and b.event_date
group by b.event_id

Athena is based on presto and in presto your query is trying to dynamically generate values for between b.7_days_before_event_date and b.event_date clause and the value is not known until planning time, so your query ends up scanning all the partitions.

The community is already working on a feature called dynamic filtering which will helps in solving this kind of performance related issues.

You can also refer to link which talks more about this issue in detail and for possible work arounds.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM