hadoop writing output to hdfs file

Question

I have written my first map reduce program and when I ran it in eclipse it writes to the output file and works as expected. However when I run it from command line using hadoop jar myjar.jar the results are not getting written to the output file. The output files(_SUCCESS and part-r-0000) are getting created but they are empty. Is there any persistence issue?. Reduce input records =12 but reduce output records =0. But the same is not zero if I do it in eclipse. In eclipse reduce output records is not 0. Any help is appreciated. Thanks

[cloudera@quickstart Desktop]$ sudo hadoop jar checkjar.jar hdfs://quickstart.cloudera:8020/user/cloudera/input.csv hdfs://quickstart.cloudera:8020/user/cloudera/output9
15/04/28 22:09:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/04/28 22:09:07 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/04/28 22:09:08 INFO input.FileInputFormat: Total input paths to process : 1
15/04/28 22:09:09 INFO mapreduce.JobSubmitter: number of splits:1
15/04/28 22:09:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1430279123629_0011
15/04/28 22:09:10 INFO impl.YarnClientImpl: Submitted application application_1430279123629_0011
15/04/28 22:09:10 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1430279123629_0011/
15/04/28 22:09:10 INFO mapreduce.Job: Running job: job_1430279123629_0011
15/04/28 22:09:22 INFO mapreduce.Job: Job job_1430279123629_0011 running in uber mode : false
15/04/28 22:09:22 INFO mapreduce.Job:  map 0% reduce 0%
15/04/28 22:09:32 INFO mapreduce.Job:  map 100% reduce 0%
15/04/28 22:09:46 INFO mapreduce.Job:  map 100% reduce 100%
15/04/28 22:09:46 INFO mapreduce.Job: Job job_1430279123629_0011 completed successfully
15/04/28 22:09:46 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=265
        FILE: Number of bytes written=211403
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=365
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=8175
        Total time spent by all reduces in occupied slots (ms)=10124
        Total time spent by all map tasks (ms)=8175
        Total time spent by all reduce tasks (ms)=10124
        Total vcore-seconds taken by all map tasks=8175
        Total vcore-seconds taken by all reduce tasks=10124
        Total megabyte-seconds taken by all map tasks=8371200
        Total megabyte-seconds taken by all reduce tasks=10366976
    Map-Reduce Framework
        Map input records=12
        Map output records=12
        Map output bytes=235
        Map output materialized bytes=265
        Input split bytes=120
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=265
        Reduce input records=12
        Reduce output records=0
        Spilled Records=24
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=172
        CPU time spent (ms)=1150
        Physical memory (bytes) snapshot=346574848
        Virtual memory (bytes) snapshot=1705988096
        Total committed heap usage (bytes)=196481024
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=245
    File Output Format Counters 
        Bytes Written=0

Reducer.java

package com.mapreduce.assgn4;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;



public class JoinReducer
extends Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text key, Iterable<Text> values,
 Context context)
 throws IOException, InterruptedException {
List<String> tableoneTuples = new ArrayList<String>();
List<String> tabletwoTuples = new ArrayList<String>();

 for (Text value : values) {
     String[] splitValues = value.toString().split("#");
     String tableName = splitValues[0];
     if(tableName.equals(JoinMapper.tableone))
     {
         tableoneTuples.add(splitValues[1]);
     }
     else
     {
         tabletwoTuples.add(splitValues[1]);
     }
 }
 System.out.println(tableoneTuples.size());
 System.out.println(tabletwoTuples.size());

 String FinaljoinString = null;
 for(String tableoneValue: tableoneTuples)
 {
     for (String tabletwoValue: tabletwoTuples)
     {
         FinaljoinString = tableoneValue+","+tabletwoValue;
         FinaljoinString = key.toString()+","+FinaljoinString;
         context.write(null, new Text(FinaljoinString));
     }
 }

}
}

Answer 1

Your context.write in reducer has bugs. You need to have NullWritable to have null in output ,

context.write(NullWritable, new Text(FinaljoinString));

hadoop writing output to hdfs file

Question

1 answers

solution1
2 2015-04-29 06:03:59

hadoop writing output to hdfs file

Question

1 answers

solution1 2 2015-04-29 06:03:59

solution1
2 2015-04-29 06:03:59