Load the latest partition from a hive table in pig

Question

I want to know how I can load the latest partition from a hive table in a pig script. Obviously, I can load the whole data and then use the FILTER command to filter the corresponding partition.

However, if we don't know what is the latest date partition for the hive table, how can we get the latest date itself and load the partition for that corresponding date?

Answer 1

as of my knowledge we cant do it directly.i am pointing some way with shell scripting. hope your partioned columns is in datehour format or numarical incremental order.

hive -e 'select max(datehour) from tweets1' > datehour.txt;

   # i am storing of above query output to one temp file datehour.txt

datehour=$(awk '{print $0}' /home/winit/Desktop/needtocopy1/hivequeries/datehour.txt)

   # reading that file with above command.

 hive -e 'describe formatted tweets1  partition (datehour='$datehour')' > partitionloc.txt;

   # with describe command i am storing output to onemore temp file.

 partionLocation=$(awk '/Location:/ { print $2 }' partitionloc.txt)

  # i am reading the temp file with pattern 'Location',its partition location

  # pass the location to pig script as parameter to load data from..

 pig  -f  pigfile.pig --param location=$partionLocation

let me know if not works

Load the latest partition from a hive table in pig

Question

1 answers

solution1
0 2015-05-07 07:50:03

Load the latest partition from a hive table in pig

Question

1 answers

solution1 0 2015-05-07 07:50:03

solution1
0 2015-05-07 07:50:03