[英]Hadoop Map Reduce program for hashing
I have written a Map Reduce Program in Hadoop for hashing all the records of the file, and appending the hased value as an additional attribute to each record and then output to Hadoop file system This is the code i have written 我在Hadoop中编写了Map Reduce程序来散列文件的所有记录,并将hased值作为附加属性附加到每个记录,然后输出到Hadoop文件系统这是我写的代码
public class HashByMapReduce
{
public static class LineMapper extends Mapper<Text, Text, Text, Text>
{
private Text word = new Text();
public void map(Text key, Text value, Context context) throws IOException, InterruptedException
{
key.set("single")
String line = value.toString();
word.set(line);
context.write(key, line);
}
}
public static class LineReducer
extends Reducer<Text,Text,Text,Text>
{
private Text result = new Text();
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException
{
String translations = "";
for (Text val : values)
{
translations = val.toString()+","+String.valueOf(hash64(val.toString())); //Point of Error
result.set(translations);
context.write(key, result);
}
}
}
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = new Job(conf, "Hashing");
job.setJarByClass(HashByMapReduce.class);
job.setMapperClass(LineMapper.class);
job.setReducerClass(LineReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(KeyValueTextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
I have written this code with the logic that Each line is read by the Map method which assigns all value to a single key which then passes to same Reducer method. 我已经编写了这个代码,其逻辑是每个行都由Map方法读取,该方法将所有值分配给单个键,然后传递给相同的Reducer方法。 which the passes each values to hash64() function. 其中每个值都传递给hash64()函数。
But i see its passing a null value(empty value) to hash function. 但我看到它将一个空值(空值)传递给哈希函数。 I am not unable to figure it out why? 我无法弄明白为什么? Thanks in advance 提前致谢
The cause of the problem is most probably due to the use of KeyValueTextInputFormat
. 问题的原因很可能是由于使用了KeyValueTextInputFormat
。 From Yahoo tutorial : 来自雅虎教程 :
InputFormat: Description: Key: Value:
TextInputFormat Default format; The byte offset The line contents
reads lines of of the line
text files
KeyValueInputFormat Parses lines Everything up to the The remainder of
into key, first tab character the line
val pairs
It's breaking your input lines wrt tab
character. 它打破了你的输入行和tab
字符。 I suppose there is no tab
in your lines. 我想你的行中没有tab
。 As a result the key
in the LineMapper
is a whole line while nothing is being passed as value
( not sure null
or empty ). 因此, LineMapper
的key
是一个整行,而没有任何东西作为value
传递(不确定为null
或为空)。
From your code I think you should better use TextInputFormat
class as your inputformat which produces line offset as key
and the complete line as value
. 从您的代码中我认为最好使用TextInputFormat
类作为inputformat,它将行偏移量作为key
,将整行作为value
。 This should solve your problem. 这应该可以解决您的问题。
EDIT : I run your code with following changes, and it seems to work fine: 编辑:我运行您的代码与以下更改,它似乎工作正常:
TextInputFormat
and accordingly change declaration of the Mapper 将inputformat更改为TextInputFormat
并相应地更改Mapper的声明 setMapOutputKeyClass
& setMapOutputValueClass
to the job
. 为job
添加了适当的setMapOutputKeyClass
和setMapOutputValueClass
。 These are not mandatory but often creates problem on running. 这些不是强制性的,但通常会在运行时产生问题。 ket.set("single")
and added a private outkey to the Mapper. 删除了你的ket.set("single")
并为Mapper添加了一个私有outkey。 hash64
method, I used String.toUpperCase
for testing. 由于您没有提供hash64
方法的详细信息, hash64
我使用String.toUpperCase
进行测试。 If the issue persists, then I'm sure that your hash method hasn't handle null
well. 如果问题仍然存在,那么我确信你的哈希方法没有很好地处理null
。
Full code : 完整代码:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class HashByMapReduce {
public static class LineMapper extends
Mapper<LongWritable, Text, Text, Text> {
private Text word = new Text();
private Text outKey = new Text("single");
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
word.set(line);
context.write(outKey, word);
}
}
public static class LineReducer extends Reducer<Text, Text, Text, Text> {
private Text result = new Text();
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String translations = "";
for (Text val : values) {
translations = val.toString() + ","
+ val.toString().toUpperCase(); // Point of Error
result.set(translations);
context.write(key, result);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "Hashing");
job.setJarByClass(HashByMapReduce.class);
job.setMapperClass(LineMapper.class);
job.setReducerClass(LineReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
} }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.