简体   繁体   English

Oozie 工作流中的 Hive 操作执行参数设置

[英]Hive action execution parameters setting in Oozie workflow

I am using Hive Action for executing queries through Oozie.我正在使用 Hive Action 通过 Oozie 执行查询。 I am setting TEZ and MR as execution engine for queries.我将 TEZ 和 MR 设置为查询的执行引擎。 How can I set maximum possible number of reducers to execute on for query?如何设置最大可能数量的减速器以执行查询? Currently, I am using mapred.reduce.tasks but it takes a static number.目前,我正在使用mapred.reduce.tasks但它需要一个静态数字。

The real problem is, when I execute same queries on hive CLI, number of reducers chosen by Hive are optimal and not 1;真正的问题是,当我在 Hive CLI 上执行相同的查询时,Hive 选择的减速器数量是最佳的,而不是 1; so what setting is my Oozie job missing that it is choosing 1 reducer for all the queries?那么我的 Oozie 工作缺少什么设置,它为所有查询选择了 1 个减速器?

Usually the ideal way to control the number of reducers of a Hive query is to use the hive.exec.reducers.bytes.per.reducer property.通常控制 Hive 查询的减速器数量的理想方法是使用hive.exec.reducers.bytes.per.reducer属性。

The default value is 1 GB, where for every 1gb size of your input files one reducer will be dispatched.默认值为 1 GB,其中每 1 GB 大小的输入文件将分派一个减速器。

Try to relatively reduce this value according to the expected maximum number of reducers.尝试根据预期的最大减速器数量来相对降低该值。 By this way you may eliminate setting static number of reducers using the mapred.reduce.tasks property.通过这种方式,您可以消除使用mapred.reduce.tasks属性设置减速器的静态数量。

When running a hive action in oozie you should always set a configurations property for mapred.reduce.tasks = -1 .mapred.reduce.tasks = -1运行 hive 操作时,您应该始终为mapred.reduce.tasks = -1设置配置属性。 This will force the optimal reducer value to be created based on your system and available resources.这将强制根据您的系统和可用资源创建最佳减速器值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM