Is it possible to run map/reduce job on Hadoop cluster with no input file?

Question

When I try to run map/reduce job on Hadoop cluster without specifying any input file I get following exception:

 java.io.IOException: No input paths specified in job

Well, I can imagine cases when running a job without input files does make sense. Generation of test file would be the case. Is it possible to do that with Hadoop? If not do you have some experience on generating files? Is there better way then keeping dummy file with one record on cluster to be used as input file for generation jobs?

Answer 1

File paths are relevant for FileInputFormat based inputs like SequenceInputFormat, etc. But inputformats that read from hbase, database do not read from files, so you could make your own implementation of the InputFormat and define your own behaviour in getSplits, RecordReader, createRecordReader. For insperation look into the source code of the TextInputFormat class.

Answer 2

I guess your are looking to test your map-reduce on samll set of data so in that case i will recommand following

Unit Test For Map-Reduce will solve your problem

If you want to test your mapper/combiner/reducer for a single line of linput from your file , best possible thing is to use UnitTest for each .

sample code:-
using Mocking Frame work in java Use can run these test cases in your IDE

Here i have used Mockito OR MRunit can also be used which too is depended on a Mockito(Java Mocking Framework)

public class BoxPlotMapperTest {
@Test
public void validOutputTextMapper() throws IOException, InterruptedException
{
    Mapper mapper=new Mapper();//Your Mapper Object 
    Text line=new Text("single line from input-file"); // single line input from file 
    Mapper.Context context=Mockito.mock(Mapper.Context.class);
    mapper.map(null, line, context);//(key=null,value=line,context)//key was not used in my code so its null 
    Mockito.verify(context).write(new Text("your expected key-output"), new Text("your expected value-output")); // 

}

@Test
public void validOutputTextReducer() throws IOException, InterruptedException
{
    Reducer reduer=new Reducer();
    final List<Text> values=new ArrayList<Text>();
    values.add(new Text("value1"));
    values.add(new Text("value2"));
    values.add(new Text("value3"));
    values.add(new Text("value4"));
    Iterable<Text> iterable=new Iterable<Text>() {

        @Override
        public Iterator<Text> iterator() {
            // TODO Auto-generated method stub
            return values.iterator();
        }
    };
    Reducer.Context context=Mockito.mock(Reducer.Context.class);
    reduer.reduce(new Text("key"),iterable, context);
    Mockito.verify(context).write(new Text("your expected key-output"), new Text("your expected value-output"));

}

}

Answer 3

For MR job unit testing you can also use MRUnit . If you want to generate test data with Hadoop, then I'd recommend you to have a look at the source code of Teragen .

Answer 4

If you want to generate a test file why would you need to use hadoop in the first place? Any kind of file you'd use an input to a mapreduce step can be created using type-specific API's outside on a mapreduce step, even HDFS files.

Answer 5

I know I'm resurrecting an old thread, but there was no best answer chosen, so I thought I'd throw this out there. I agre MRUnit is good for many things, but sometimes I just wanna play around with some real data (especially for tests where I'd need to mock it out to make it work in MRUnit). When that's my goal, I create a separate little job to test my ideas and use SleepInputFormat to basically lie to Hadoop and say there's input when really there's not. The old API provided an example of that here: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/SleepJob.java , and I converted the input format to the new API here: https://gist.github.com/keeganwitt/6053872 .

Is it possible to run map/reduce job on Hadoop cluster with no input file?

Question

5 answers

solution1
1 2012-11-12 11:33:53

solution2
0 2012-11-12 07:28:23

I guess your are looking to test your map-reduce on samll set of data so in that case i will recommand following

Unit Test For Map-Reduce will solve your problem

solution3
0 2012-11-12 11:09:35

solution4
0 2012-11-12 12:29:49

solution5
0 2013-12-06 19:39:08

Is it possible to run map/reduce job on Hadoop cluster with no input file?

Question

5 answers

solution1 1 2012-11-12 11:33:53

solution2 0 2012-11-12 07:28:23

I guess your are looking to test your map-reduce on samll set of data so in that case i will recommand following

Unit Test For Map-Reduce will solve your problem

solution3 0 2012-11-12 11:09:35

solution4 0 2012-11-12 12:29:49

solution5 0 2013-12-06 19:39:08

solution1
1 2012-11-12 11:33:53

solution2
0 2012-11-12 07:28:23

solution3
0 2012-11-12 11:09:35

solution4
0 2012-11-12 12:29:49

solution5
0 2013-12-06 19:39:08