How to loop through Hive Query and use loop variable

Question

Say I have two tables

External table - etable

Internal table - itable

and my etable is partitioned based on date.

Now to populate my itable everyday from etable's data I have workflow and coordinator in hue with Hive Query as shown below:

ALTER TABLE etable ADD IF NOT EXISTS PARTITION (date = '${date}') LOCATION 'path/date=${date}';

INSERT OVERWRITE TABLE itable partition(date = '${date}') SELECT * FROM etable WHERE date = '${date}';

Now suppose everyday I want to update my data for past n days, how do I do that?

Eg

Lets take n = 2 and if coordinator is scheduled to run today ie 2018-01-20 (yyyy-MM-dd) then it should update data for past 2 days. So the query should update the data for 2018-01-20 and 2018-01-19 . So basically I need to run the above query twice with different date.

Is there any way to loop this query n times and use the loop variable, because then I can use date_sub() to get different date in each iteration of loop. Or is there any better way?

Thank you.

Answer 1

You should be able to do `

INSERT OVERWRITE TABLE itable partition(`date`) 
SELECT * FROM etable
WHERE `date` BETWEEN datesub('${date}', ${n}) AND '${date}'

Anyways, Hive has no loops. Hue and Oozie won't be able to do that either since you're trying to dynamically build queries

The way you would do this would require a bash loop with beeline -u jdbc:hive2://server:10000 --hivevar date="value" -f script.sql

Or you can use Python, Java, or whatever you're comfortable with with to write a loop as long as it can communicate with Hive.

Then, you can schedule that script/code with Oozie

How to loop through Hive Query and use loop variable

Question

1 answers

solution1
2 ACCPTED 2018-01-20 14:54:52

How to loop through Hive Query and use loop variable

Question

1 answers

solution1 2 ACCPTED 2018-01-20 14:54:52

solution1
2 ACCPTED 2018-01-20 14:54:52