简体繁体 English

Oozie 工作流中的 Hive 操作执行参数设置

[英]Hive action execution parameters setting in Oozie workflow

原文 2015-09-08 06:00:28 8 2 hadoop/ hive/ oozie

I am using Hive Action for executing queries through Oozie.我正在使用 Hive Action 通过 Oozie 执行查询。 I am setting TEZ and MR as execution engine for queries.我将 TEZ 和 MR 设置为查询的执行引擎。 How can I set maximum possible number of reducers to execute on for query?如何设置最大可能数量的减速器以执行查询？ Currently, I am using mapred.reduce.tasks but it takes a static number.目前，我正在使用mapred.reduce.tasks但它需要一个静态数字。

The real problem is, when I execute same queries on hive CLI, number of reducers chosen by Hive are optimal and not 1;真正的问题是，当我在 Hive CLI 上执行相同的查询时，Hive 选择的减速器数量是最佳的，而不是 1； so what setting is my Oozie job missing that it is choosing 1 reducer for all the queries?那么我的 Oozie 工作缺少什么设置，它为所有查询选择了 1 个减速器？

2 个解决方案

Usually the ideal way to control the number of reducers of a Hive query is to use the hive.exec.reducers.bytes.per.reducer property.通常控制 Hive 查询的减速器数量的理想方法是使用hive.exec.reducers.bytes.per.reducer属性。

The default value is 1 GB, where for every 1gb size of your input files one reducer will be dispatched.默认值为 1 GB，其中每 1 GB 大小的输入文件将分派一个减速器。

Try to relatively reduce this value according to the expected maximum number of reducers.尝试根据预期的最大减速器数量来相对降低该值。 By this way you may eliminate setting static number of reducers using the mapred.reduce.tasks property.通过这种方式，您可以消除使用mapred.reduce.tasks属性设置减速器的静态数量。

When running a hive action in oozie you should always set a configurations property for mapred.reduce.tasks = -1 .在mapred.reduce.tasks = -1运行 hive 操作时，您应该始终为mapred.reduce.tasks = -1设置配置属性。 This will force the optimal reducer value to be created based on your system and available resources.这将强制根据您的系统和可用资源创建最佳减速器值。