Say I have two tables
External table - etable
Internal table - itable
and my etable is partitioned based on date.
Now to populate my itable everyday from etable's data I have workflow and coordinator in hue with Hive Query as shown below:
ALTER TABLE etable ADD IF NOT EXISTS PARTITION (date = '${date}') LOCATION 'path/date=${date}';
INSERT OVERWRITE TABLE itable partition(date = '${date}') SELECT * FROM etable WHERE date = '${date}';
Now suppose everyday I want to update my data for past n
days, how do I do that?
Eg
Lets take n = 2
and if coordinator is scheduled to run today ie 2018-01-20
(yyyy-MM-dd) then it should update data for past 2
days. So the query should update the data for 2018-01-20
and 2018-01-19
. So basically I need to run the above query twice with different date.
Is there any way to loop this query n
times and use the loop variable, because then I can use date_sub()
to get different date in each iteration of loop. Or is there any better way?
Thank you.
You should be able to do `
INSERT OVERWRITE TABLE itable partition(`date`)
SELECT * FROM etable
WHERE `date` BETWEEN datesub('${date}', ${n}) AND '${date}'
Anyways, Hive has no loops. Hue and Oozie won't be able to do that either since you're trying to dynamically build queries
The way you would do this would require a bash loop with beeline -u jdbc:hive2://server:10000 --hivevar date="value" -f script.sql
Or you can use Python, Java, or whatever you're comfortable with with to write a loop as long as it can communicate with Hive.
Then, you can schedule that script/code with Oozie
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.