简体   繁体   中英

How to loop through Hive Query and use loop variable

Say I have two tables

External table - etable

Internal table - itable

and my etable is partitioned based on date.

Now to populate my itable everyday from etable's data I have workflow and coordinator in hue with Hive Query as shown below:

ALTER TABLE etable ADD IF NOT EXISTS PARTITION (date = '${date}') LOCATION 'path/date=${date}';

INSERT OVERWRITE TABLE itable partition(date = '${date}') SELECT * FROM etable WHERE date = '${date}';

Now suppose everyday I want to update my data for past n days, how do I do that?

Eg

Lets take n = 2 and if coordinator is scheduled to run today ie 2018-01-20 (yyyy-MM-dd) then it should update data for past 2 days. So the query should update the data for 2018-01-20 and 2018-01-19 . So basically I need to run the above query twice with different date.

Is there any way to loop this query n times and use the loop variable, because then I can use date_sub() to get different date in each iteration of loop. Or is there any better way?

Thank you.

You should be able to do `

INSERT OVERWRITE TABLE itable partition(`date`) 
SELECT * FROM etable
WHERE `date` BETWEEN datesub('${date}', ${n}) AND '${date}'

Anyways, Hive has no loops. Hue and Oozie won't be able to do that either since you're trying to dynamically build queries

The way you would do this would require a bash loop with beeline -u jdbc:hive2://server:10000 --hivevar date="value" -f script.sql

Or you can use Python, Java, or whatever you're comfortable with with to write a loop as long as it can communicate with Hive.

Then, you can schedule that script/code with Oozie

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM