I want to know how I can load the latest partition from a hive table in a pig script. Obviously, I can load the whole data and then use the FILTER command to filter the corresponding partition.
However, if we don't know what is the latest date partition for the hive table, how can we get the latest date itself and load the partition for that corresponding date?
as of my knowledge we cant do it directly.i am pointing some way with shell scripting. hope your partioned columns is in datehour format or numarical incremental order.
hive -e 'select max(datehour) from tweets1' > datehour.txt;
# i am storing of above query output to one temp file datehour.txt
datehour=$(awk '{print $0}' /home/winit/Desktop/needtocopy1/hivequeries/datehour.txt)
# reading that file with above command.
hive -e 'describe formatted tweets1 partition (datehour='$datehour')' > partitionloc.txt;
# with describe command i am storing output to onemore temp file.
partionLocation=$(awk '/Location:/ { print $2 }' partitionloc.txt)
# i am reading the temp file with pattern 'Location',its partition location
# pass the location to pig script as parameter to load data from..
pig -f pigfile.pig --param location=$partionLocation
let me know if not works
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.