简体   繁体   English

如何使用hadoop.mapreduce.lib.output.MultipleOutputs使用oozie工作流创建目录结构?

[英]How to use hadoop.mapreduce.lib.output.MultipleOutputs to create directory structure using oozie workflow?

I am running my MR jobs using workflow:0.5". My use case to create key based directory structure for output. This is my configuration file:- 我正在使用工作流:0.5“运行我的MR作业。我的用例是创建基于密钥的目录结构以进行输出。这是我的配置文件:-

`           
        <configuration>
                <!-- These are important. -->
                <property>
                    <name>mapred.mapper.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.reducer.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queue.name}
                    </value>
                </property>
                <property>
                    <name>mapreduce.map.class</name>
                    <value>com.a.b.c.Amapper</value>
                </property>
                <property>
                    <name>mapreduce.reduce.class</name>
                    <value>com.a.b.c.Areducer</value>
                </property>
                <property>
                    <name>mapred.output.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapred.output.value.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapreduce.outputformat.class</name>
                    <value>org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
                    </value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputDir}</value>
                </property>
            </configuration>`

In reducer, i want to create formatted directory structure using this code- 在reducer中,我想使用此代码创建格式化的目录结构-

`public class Areducer extends Reducer<Text, Text, Text, Text> {
    private Text aggregatorRecord = new Text();
    private MultipleOutputs<Text, Text> out;

    public void setup(Context context) {
        out = new MultipleOutputs<Text, Text>(context);
    }

    public void reduce(Text aggregatorRecordKey,
            Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        /** 
           some business logic to do aggregation to set aggregatorRecord.
        */
        String plist = "Surname|Forename";
        Text t = new Text(plist);
        out.write(aggregatorRecordKey, aggregatorRecord, generateFileName(t));
    }

    protected void cleanup(Context context) throws IOException,
            InterruptedException {
        out.close();
    }

    private String generateFileName(Text k) {
        String[] kStr = k.toString().split("\\|");

        String sName = kStr[0];
        String fName = kStr[1];

        // example for k = Smith|John
        // output written to /user/hadoop/path/to/output/Smith/John-r-00000
        // (etc)
        return sName + "/" + fName;
    }

` `

oozie workflow gives this exception oozie工作流程给出了此异常

java.lang.NoSuchMethodException: org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.<\\init>() java.lang.NoSuchMethodException:org.apache.hadoop.mapreduce.lib.output.MultipleOutputs。<\\ init>()

Can someone suggest the correct approach to use MultipleOutputs to create directory structure using oozie workflow? 有人可以建议使用oozie工作流使用MultipleOutputs创建目录结构的正确方法吗?

Your problem is that MultipleOutputs is not an OutputFormat and so you don't set it as the output format for your job. 您的问题是MultipleOutputs不是OutputFormat,因此您没有将其设置为作业的输出格式。 I usually use a java class to configure and submit my MultipleOutputs jobs, but looking at your code I think what you need to to is set your output format type to TextOutputFormat and leave your references to reducer variable as they are. 我通常使用Java类来配置和提交我的MultipleOutputs作业,但是查看您的代码,我认为您需要将输出格式类型设置为TextOutputFormat,并保留对reducer变量的引用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM