简体   繁体   English

在嵌套子查询中显示配置单元分区

[英]show hive partitions in a nested sub query

I have a hive table that is partitioned by day (eg 20151001, 20151002,....). 我有一个按天划分的配置单元表(例如20151001、20151002等)。 Is there a hive query to list these partitions in a way that it is possible to be used in a nested sub query? 是否有配置单元查询以可以在嵌套子查询中使用的方式列出这些分区?

That is can I do something along the line of: 我可以按照以下方式做些什么:

SELECT * FROM (SHOW PARTITIONS test) a where ...

The query- 查询-

SELECT ptn FROM test 

returns as many rows as the number of rows in the test table. 返回与测试表中的行数一样多的行。 I want it to return only as many rows as the number of partitions (without using the DISTINCT function) 我希望它只返回与分区数一样多的行(不使用DISTINCT函数)

A potential solution is to find the partitions from the hdfs location for the table of interest by using either shell script/python. 潜在的解决方案是使用shell脚本/ python从hdfs位置查找感兴趣表的分区。

The data that corresponds to the hive table is stored in the hdfs eg 对应于配置单元表的数据存储在hdfs中,例如

/hive/database/table/partition/datafiles /蜂巢/数据库/表/分区/数据文件

in your case, /hive/database/table/20151001/datafiles 就您而言,/ hive / database / table / 20151001 / datafiles

If the table is bucketed there are as many individual files as the cluster size. 如果对表进行存储分区,则单个文件的数量与集群大小一样多。

Once you have the above path, create a shell script to loop through the folder (in this case 20151001 etc..) 有了上述路径后,请创建一个Shell脚本来循环浏览文件夹(在本例中为20151001等。)

capture this in a shell variable and pass it as a parameter to the hive query. 将其捕获到shell变量中,并将其作为参数传递给配置单元查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM