简体   繁体   中英

Map-Reduce job failing to deliver expected partitioned files

In a Map-Reduce job, I am using five different files where in my dataset contains values under two categories P and I . After I specific values are found, I am passing those into I-part-r-00000 file and accordingly, for P. I am using MultipleOutputformat class in reducer to achieve this.

My Mapper class contains:

public class parserMapper extends Mapper<LongWritable, Text, Text, Text> {
   public void map(LongWritable key, Text value, Context context)
   throws IOException, InterruptedException {

   String IPFLAG = "";
   String[] element_data= value.toString.split(","); 

        IPFLAG = "P"; 

    else {
       IPFLAG = "I";

   if (IPFLAG == "P") {
     context.write(new Text(IPFLAG), new Text(theData));

   else if (IPFLAG == "I") {
   context.write(new Text(IPFLAG), new Text(theData));

   System.out.println("No category found");


  public void run(Context context) throws IOException, InterruptedException {
        while (context.nextKeyValue()) {
            map(context.getCurrentKey(), context.getCurrentValue(), context);


Reducer class includes:

public class parserReducer extends Reducer<Text, Text, Text, Text> {

    private MultipleOutputs multipleOutputs;

    protected void setup(Context context) throws IOException, InterruptedException {
        multipleOutputs = new MultipleOutputs(context);

    protected void cleanup(Context context) throws IOException, InterruptedException {

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

        Object c = null;

            if (!(key.toString().isEmpty())) {

                for (Text value : values) {

                    multipleOutputs.write(c, value, key.toString());

        catch(Exception e){ System.out.println("Caught Exception: " + e.getMessage());}


and Driver code includes =>

public class parserDriver {

 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    conf.set("textinputformat.record.delimiter", "~"+"\n"+"ISA*");
    Job job = new Job(conf);
//      job.setOutputFormatClass(TextOutputFormat.class);
        LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
    //  job.setOutputFormatClass(LazyOutputFormat.class);

/*       MultipleOutputs.addNamedOutput(job, "P", TextOutputFormat.class, Text.class, Text.class);
         MultipleOutputs.addNamedOutput(job, "I", TextOutputFormat.class, Text.class, Text.class);
        // Pass as option -D mapred.reduce.tasks=<number>

        /* This line is to accept the input recursively */
        //FileInputFormat.setInputDirRecursive(job, true);

        FileInputFormat.addInputPath(job, "/Users/Mohit/input");
        FileOutputFormat.setOutputPath(job, "/Users/Mohit/output");

         * Delete output file path if already exists
        FileSystem fs = FileSystem.get(conf);

        if (fs.exists(outputFilePath)) {
            fs.delete(outputFilePath, true);

        return job.waitForCompletion(true) ? 0: 1;

Through all this, I am trying to achieve two partitions against a single file

file1 -> P-part-r00000, I-part-r00001

file2 -> P-part-r00002, I-part-r00003

. but I am getting two partitions against all the files being fed as input to this job.

file1, file2, file3, file4, file5 -> P-part-r00000, I-part-r00001

Not sure what am I missing here, if anybody can help please?

1) In your Driver add these lines to file naming:

   MultipleOutputs.addNamedOutput(job, "I", TextOutputFormat.class,
          Text.class, Text.class);
   MultipleOutputs.addNamedOutput(job, "P", TextOutputFormat.class,
          Text.class, Text.class);

2) Change your reducer to send each value to file with specific name:

public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        if (!(key.toString().isEmpty())) {

            for (Text value : values) {

                multipleOutputs.write(key.toString(), key, value);

    catch(Exception e){ System.out.println("Caught Exception: " + e.getMessage());}

3) change number of reducers to 2 to get exactly 2 files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM