简体   繁体   English

如何调整配置单元以查询元数据?

[英]How to tune hive to query metadata?

In case I am running a below hive query on table with certain partitioned column, I want to make sure hive does not do full table scan and just figure out the result from meta data itself. 如果我在具有某些分区列的表上运行下面的配置单元查询,我想确保配置单元不进行全表扫描,而只是从元数据本身中找出结果。 Is there any way to enable this ? 有什么办法可以做到这一点?

Select max(partitioned_col) from hive_table ;

Right now , when I am running this query , its launching map reduce tasks and I am sure its doing data scan while it can very well figure out the value from metadata itself. 现在,当我运行此查询时,它的启动图会减少任务,并且可以确定它在进行数据扫描,同时可以很好地从元数据本身中找出值。

Compute table statistics every time you changed data. 每次更改数据时都要计算表统计信息。

ANALYZE TABLE hive_table PARTITION(partitioned_col) COMPUTE STATISTICS FOR COLUMNS;

Enable CBO and statistics auto gathering: 启用CBO和统计信息自动收集:

set hive.cbo.enable=true;
set hive.stats.autogather=true;

Use these settings to enable CBO using statistics: 使用以下设置可以使用统计信息启用CBO:

set hive.compute.query.using.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.stats.fetch.column.stats=true;

If nothing helps I'd recommend to apply this approach for finding last partition fast: Parse max partition key using shell script from the table location. 如果没有帮助, 我建议您采用这种方法快速找到最后一个分区:使用表位置中的shell脚本解析最大分区键。 The command below will print all table folder paths, sort, take latest sorted, take last subfolder name, parse partition folder name and extract value. 下面的命令将打印所有表文件夹路径,排序,采用最新排序,采用最后一个子文件夹名称,解析分区文件夹名称并提取值。 All you need is to initialize TABLE_DIR variable and put the number of partition subfolder in the path : 您只需要初始化TABLE_DIR变量并将the number of partition subfolder in the path放在the number of partition subfolder in the path

last_partition=$(hadoop fs -ls $TABLE_DIR/* | awk '{ print $8 }' | sort -r | head -n1 | cut -d / -f [number of partition subfolder in the path here] | cut -d = -f 2

Then use $last_partition variable to pass to your script as 然后使用$last_partition变量传递给您的脚本为

  hive -hiveconf last_partition="$last_partition" -f your_script.hql

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM