[英]Hive action execution parameters setting in Oozie workflow
I am using Hive Action for executing queries through Oozie.我正在使用 Hive Action 通过 Oozie 执行查询。 I am setting TEZ and MR as execution engine for queries.
我将 TEZ 和 MR 设置为查询的执行引擎。 How can I set maximum possible number of reducers to execute on for query?
如何设置最大可能数量的减速器以执行查询? Currently, I am using
mapred.reduce.tasks
but it takes a static number.目前,我正在使用
mapred.reduce.tasks
但它需要一个静态数字。
The real problem is, when I execute same queries on hive CLI, number of reducers chosen by Hive are optimal and not 1;真正的问题是,当我在 Hive CLI 上执行相同的查询时,Hive 选择的减速器数量是最佳的,而不是 1; so what setting is my Oozie job missing that it is choosing 1 reducer for all the queries?
那么我的 Oozie 工作缺少什么设置,它为所有查询选择了 1 个减速器?
Usually the ideal way to control the number of reducers of a Hive query is to use the hive.exec.reducers.bytes.per.reducer
property.通常控制 Hive 查询的减速器数量的理想方法是使用
hive.exec.reducers.bytes.per.reducer
属性。
The default value is 1 GB, where for every 1gb size of your input files one reducer will be dispatched.默认值为 1 GB,其中每 1 GB 大小的输入文件将分派一个减速器。
Try to relatively reduce this value according to the expected maximum number of reducers.尝试根据预期的最大减速器数量来相对降低该值。 By this way you may eliminate setting static number of reducers using the
mapred.reduce.tasks
property.通过这种方式,您可以消除使用
mapred.reduce.tasks
属性设置减速器的静态数量。
When running a hive action in oozie you should always set a configurations property for mapred.reduce.tasks = -1
.在
mapred.reduce.tasks = -1
运行 hive 操作时,您应该始终为mapred.reduce.tasks = -1
设置配置属性。 This will force the optimal reducer value to be created based on your system and available resources.这将强制根据您的系统和可用资源创建最佳减速器值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.