自定义分区程序在Oozie Mapreduce操作中不起作用

Question

I have implemented secondary sort in mapreduce and trying to execute it using Oozie (From Hue). 我已经在mapreduce中实现了二级排序，并尝试使用Oozie（来自Hue）执行它。

Though I have set the partitioner class in the properties, the partitioner is not being executed. 尽管我已在属性中设置了分区程序类，但未执行分区程序。 So, I'm not getting output as expected. 因此，我没有得到预期的输出。

The same code runs fine when run using hadoop command. 使用hadoop命令运行时，相同的代码运行良好。

And here is my workflow.xml 这是我的workflow.xml

<workflow-app name="MyTriplets" xmlns="uri:oozie:workflow:0.5">
<start to="mapreduce-598d"/>
<kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="mapreduce-598d">
    <map-reduce>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.output.dir</name>
                <value>/test_1109_3</value>
            </property>
            <property>
                <name>mapred.input.dir</name>
                <value>/apps/hive/warehouse/7360_0609_rx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0609_tx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0509_util/day=05-09-2017/hour=16/quarter=1/</value>
            </property>
            <property>
                <name>mapred.input.format.class</name>
                <value>org.apache.hadoop.hive.ql.io.RCFileInputFormat</value>
            </property>
            <property>
                <name>mapred.mapper.class</name>
                <value>PonRankMapper</value>
            </property>
            <property>
                <name>mapred.reducer.class</name>
                <value>PonRankReducer</value>
            </property>
            <property>
                <name>mapred.output.value.comparator.class</name>
                <value>PonRankGroupingComparator</value>
            </property>
            <property>
                <name>mapred.mapoutput.key.class</name>
                <value>PonRankPair</value>
            </property>
            <property>
                <name>mapred.mapoutput.value.class</name>
                <value>org.apache.hadoop.io.Text</value>
            </property>
            <property>
                <name>mapred.reduce.output.key.class</name>
                <value>org.apache.hadoop.io.NullWritable</value>
            </property>
            <property>
                <name>mapred.reduce.output.value.class</name>
                <value>org.apache.hadoop.io.Text</value>
            </property>
            <property>
                <name>mapred.reduce.tasks</name>
                <value>1</value>
            </property>
            <property>
                <name>mapred.partitioner.class</name>
                <value>PonRankPartitioner</value>
            </property>
            <property>
                <name>mapred.mapper.new-api</name>
                <value>False</value>
            </property>
        </configuration>
    </map-reduce>
    <ok to="End"/>
    <error to="Kill"/>
</action>
<end name="End"/>

When running using hadoop jar command, I set the partitioner class using JobConf.setPartitionerClass API. 使用hadoop jar命令运行时，我使用JobConf.setPartitionerClass API设置了分区程序类。

Not sure why my partitioner is not executed when running using Oozie. 不知道为什么使用Oozie运行时未执行我的分区程序。 Inspite of adding 尽管添加

            <property>
                <name>mapred.partitioner.class</name>
                <value>PonRankPartitioner</value>
            </property>

Any What I'm missing when running it from Oozie ?? 从Oozie运行它时我缺少什么？

Answer 1

Solved this by re-writing the mapreduce job using new API's. 通过使用新的API重写mapreduce作业来解决此问题。

The property used in oozie workflow for partitioner was mapreduce.partitioner.class. oozie工作流中用于分区程序的属性为mapreduce.partitioner.class。

自定义分区程序在Oozie Mapreduce操作中不起作用

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-09-15 13:01:45

自定义分区程序在Oozie Mapreduce操作中不起作用

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-09-15 13:01:45

解决方案1
0 已采纳 2017-09-15 13:01:45