從Pig中的配置單元表加載最新分區

Question

我想知道如何從Pig腳本中的配置單元表中加載最新分區。 顯然，我可以加載整個數據，然后使用FILTER命令過濾相應的分區。

但是，如果我們不知道hive表的最新日期分區是什么，我們如何獲取最新日期本身並加載對應日期的分區？

Answer 1

據我所知，我們不能直接做到這一點。我用Shell腳本指出了某種方式。 希望您分配的列采用datehour格式或數字遞增順序。

hive -e 'select max(datehour) from tweets1' > datehour.txt;

   # i am storing of above query output to one temp file datehour.txt

datehour=$(awk '{print $0}' /home/winit/Desktop/needtocopy1/hivequeries/datehour.txt)

   # reading that file with above command.

 hive -e 'describe formatted tweets1  partition (datehour='$datehour')' > partitionloc.txt;

   # with describe command i am storing output to onemore temp file.

 partionLocation=$(awk '/Location:/ { print $2 }' partitionloc.txt)

  # i am reading the temp file with pattern 'Location',its partition location

  # pass the location to pig script as parameter to load data from..

 pig  -f  pigfile.pig --param location=$partionLocation

讓我知道是否有效

從Pig中的配置單元表加載最新分區

問題描述

1 個解決方案

解決方案1
0 2015-05-07 07:50:03

從Pig中的配置單元表加載最新分區

問題描述

1 個解決方案

解決方案1 0 2015-05-07 07:50:03

解決方案1
0 2015-05-07 07:50:03