[英]Matching input String pattern with a given string and separating w.r.t special characters
[英]Splitting Files w.r.t input file MapReduce
您能帮助我如何使用此Mapreduce程序获取“以下”输出吗? 实际上,此代码工作正常,但输出与预期不符...在两个文件中生成了输出,但是在Name.txt文件或Age.txt文件中生成了输出交换
Name:A
Age:28
Name:B
Age:25
Name:K
Age:20
Name:P
Age:18
Name:Ak
Age:11
Name:N
Age:14
Name:Kr
Age:26
Name:Ra
Age:27
我的输出应分为姓名和年龄
Name:A
Name:B
Name:K
Name:P
Name:Ak
Name:N
Name:Kr
Name:Ra
Age:28
Age:25
Age:20
Age:18
Age:11
Age:14
Age:26
Age:27
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class MyMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value,OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String [] dall=value.toString().split(":");
output.collect(new Text(dall[0]),new Text(dall[1]));
}
}
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class MyReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
while (values.hasNext()) {
output.collect(new Text(key),new Text(values.next()));
}
}
}
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.lib.*;
public class MultiFileOutput extends MultipleTextOutputFormat<Text, Text>{
protected String generateFileNameForKeyValue(Text key, Text value,String name) {
//return new Path(key.toString(), name).toString();
return key.toString();
}
protected Text generateActualKey(Text key, Text value) {
//return new Text(key.toString());
return null;
}
}
import java.io.IOException;
import java.lang.Exception;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
public class MyDriver{
public static void main(String[] args) throws Exception,IOException {
Configuration mycon=new Configuration();
JobConf conf = new JobConf(mycon,MyDriver.class);
//JobConf conf = new JobConf(MyDriver.class);
conf.setJobName("Splitting");
conf.setMapperClass(MyMapper.class);
conf.setReducerClass(MyReducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(MultiFileOutput.class);
conf.setOutputKeyClass(Text.class);
conf.setMapOutputKeyClass(Text.class);
//conf.setOutputValueClass(Text.class);
conf.setMapOutputValueClass(Text.class);
FileInputFormat.setInputPaths(conf,new Path(args[0]));
FileOutputFormat.setOutputPath(conf,new Path(args[1]));
JobClient.runJob(conf);
//System.err.println(JobClient.runJob(conf));
}
}
谢谢
好吧,这比简单的字数统计更复杂:)
因此,您需要的是复杂的密钥和分区程序。 并设定减速机数量= 2
您的复杂键可以是Text(名称| A或Age | 28的串联)或CustomWritable(具有2个实例变量,其类型(名称或年龄)和值)
在映射器中,创建Text或CustomWritable并将其设置为输出键,并且值可以只是人名或年龄。
创建一个分区程序(实现org.apache.hadoop.mapred.Partitioner)。 在getPartition方法中,您基本上根据密钥决定去向哪个减速器。
希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.