We run this below query daily and this query runs for 3 hours or so, owing due to sheer volume of data in the transaction table. Is there any way we can tune this query or reduce the execution time?
CREATE TEMPORARY TABLE t1 AS
SELECT DISTINCT EVENT_DATE FROM (
SELECT DISTINCT EVENT_DATE FROM mstr_wrk.cust_transation
WHERE load_date BETWEEN CAST(CAST('2019-03-05 04:00:31.0' AS TIMESTAMP) AS DATE) AND CURRENT_DATE() AND event_title = 'SETUP'
AND state != 'INACTIVE' AND mode != 'DORMANT') T
I tried to reduce the number of reducers to help speed up, and also tried to enable vectorization but not much luck here. We are running on tez.
Redesign table and use INDEXes.
For example I would use a numeric 'state' column or enumerative and also a numeric or enumerative 'event' column. This can help to make efficient indexes, instead of varchar or text types.
Indexes dramatically improve speed queries up if queries are using them.
Anyway not knowing the table structure and the number of the records involved, I am just guessing...
mstr_wrk.cust_transation
is partitioned by load_date
, partition pruning will not work because you are using functions. This will cause table full scan. Calculate dates in the shell script and pass as a parameters Check this script performance before parametrizing your script
CREATE TEMPORARY TABLE t1 AS
SELECT DISTINCT EVENT_DATE FROM mstr_wrk.cust_transation
WHERE load_date >= '2019-03-05' AND load_date <= '2019-03-07'
AND event_title = 'SETUP'
AND state != 'INACTIVE' AND mode != 'DORMANT'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.