简体   繁体   中英

Load the latest partition from a hive table in pig

I want to know how I can load the latest partition from a hive table in a pig script. Obviously, I can load the whole data and then use the FILTER command to filter the corresponding partition.

However, if we don't know what is the latest date partition for the hive table, how can we get the latest date itself and load the partition for that corresponding date?

as of my knowledge we cant do it directly.i am pointing some way with shell scripting. hope your partioned columns is in datehour format or numarical incremental order.

hive -e 'select max(datehour) from tweets1' > datehour.txt;

   # i am storing of above query output to one temp file datehour.txt

datehour=$(awk '{print $0}' /home/winit/Desktop/needtocopy1/hivequeries/datehour.txt)

   # reading that file with above command.

 hive -e 'describe formatted tweets1  partition (datehour='$datehour')' > partitionloc.txt;

   # with describe command i am storing output to onemore temp file.

 partionLocation=$(awk '/Location:/ { print $2 }' partitionloc.txt)

  # i am reading the temp file with pattern 'Location',its partition location

  # pass the location to pig script as parameter to load data from..

 pig  -f  pigfile.pig --param location=$partionLocation

let me know if not works

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM