简体   繁体   中英

hadoop textinputformat read only one line per file

I wrote a simple map task for hadoop 0.20.2, the input dataset consists of 44 files, each is about 3-5MB. Each line of any file has the format int,int . The input format is the default TextInputFormat and the mapper's work is to parse the input Text into the integers.

After the task run, the statistics of hadoop framework shew that the number of input records for map task is just 44. I tried debug and found that the input records for the method map are just the first line of each file.

Does anyone know what the problem is and where can I find the solution?

Thank you in advanced.

Edit 1

The input data were generated by a different map-reduce task whose output format is TextOutputFormat<NullWritable, IntXInt> . The toString() method of IntXInt should give a string of int,int .

Edit 2

My mapper looks like the following

static class MyMapper extends MapReduceBas
  implements Mapper<LongWritable, Text, IntWritable, IntWritable> {

  public void map(LongWritable key,
                  Text value,
                  OutputCollector<IntWritable, IntWritable> output,
                  Reporter reporter) {

    String[] s = value.toString().split(",");
    IntXInt x = new IntXInt(s[0], s[1]);
    output.collect(x.firstInt(), x.secondInt());
  }
}

Edit 3

I have just checked, the mapper actually reads only 1 line for each file, NOT the whole file as one Text value.

The InputFormat defines how to read data from a file into the Mapper instances.The default TextInputFormat reads lines of text files. The key it emits for each record is the byte offset of the line read (as a LongWritable), and the value is the contents of the line up to the terminating '\\n' character (as a Text object).If you have multi-line records each separated by a $ character, you should write your own InputFormat that parses files into records split on this character instead.

I'm suspecting that your mapper gets all text as input and prints an output. Could you show your Mapper class decleration and mapper function decleration? ie

static class MyMapper extends Mapper <LongWritable,Text,Text,Text>{ 
    public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //do your mapping here

    }
}

I wonder if there is something different in this line

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM