[英]average of order_demand for each product as output - MapReduce - Java
我是 mapreduce 主題的新手,仍處於學習階段。 我提前感謝您的幫助和進一步的提示。 在大學的練習中,我遇到了以下問題:從 csv 文件(下面作為示例列出)我想計算每個產品代碼的平均 order_demand。
下面顯示的代碼“FrequencyMapper”和“FreqeuencyReducer”正在我的服務器上運行,我認為我目前有 output 的顯示問題。 因為我是第一次開始使用 mapreduce,所以我很感激任何幫助。
下面列出的是映射器、減速器和驅動程序代碼。
數據集示例(csv 文件)
Product_Code,Warehouse,Product_Category,Date,Order_Demand
Product_0993,Whse_J,Category_028,2012/7/27,100
Product_0979,Whse_J,Category_028,2012/6/5,500
Product_0979,Whse_E,Category_028,2012/11/29,500
Product_1157,Whse_E,Category_006,2012/6/4,160000
Product_1159,Whse_A,Category_006,2012/7/17,50000
我的目標例如:
Product_0979 500
Product_1157 105000
...
頻率映射器.java:
package ma.test.a02;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class FrequencyMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {
String line = lineText.toString();
if(line.contains("Product")) {
String productcode = line.split(",")[0];
float orderDemand = Float.parseFloat(line.split(",")[4]);
context.write(new Text(productcode), new IntWritable((int) orderDemand));
}
}
}
減速機.java:
package ma.test.a02;
import java.io.IOException;
import javax.xml.soap.Text;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;
public class FrequencyReducer extends Reducer< Text , IntWritable , IntWritable , FloatWritable > {
public void reduce( IntWritable productcode, Iterable<IntWritable> orderDemands, Context context)
throws IOException, InterruptedException {
float averageDemand = 0;
float count = 0;
for ( IntWritable orderDemand : orderDemands) {
averageDemand +=orderDemand.get();
count +=1;
}
float result = averageDemand / count;
context.write(productcode, new FloatWritable (result));
}
}
頻率.java(驅動器):
package ma.test.a02;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Frequency {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: Average <input path> <output path>");
System.exit(-1);
}
// create a Hadoop job and set the main class
Job job = Job.getInstance();
job.setJarByClass(Frequency.class);
job.setJobName("MA-Test Average");
// set the input and output path
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// set the Mapper and Reducer class
job.setMapperClass(FrequencyMapper.class);
job.setReducerClass(FrequencyReducer.class);
// specify the type of the output
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FloatWritable.class);
// run the job
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
提示 1 :在映射器中,您過濾了以下行中包含“VOLUME”的行:
if(line.contains("VOLUME")) {
}
但是沒有一行包含“VOLUME”,所以你在減速器中沒有輸入!
提示 2 :您的減速器 output 值為FloatWritable
,您應該在跑步者中使用此行( Frequency
類):
job.setOutputValueClass(FloatWritable.class);
而不是這個:
job.setOutputValueClass(IntWritable.class);
提示 3 :在減速器中更改此行:
public class FrequencyReducer extends Reducer<IntWritable , IntWritable , IntWritable , FloatWritable>
對此:
public class FrequencyReducer extends Reducer<Text, IntWritable, IntWritable, FloatWritable >
還將這些行添加到Frequency
class:
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
提示 4 :csv 文件中描述 csv 文件結構的第一行將導致問題。 通過將以下行放在您的 map 方法的第一行來拒絕此行:
if(line.contains("Product_Code,Warehouse")) {
return;
}
提示 5 :在實際程序中,確保您有計划在orderDemand
中不能將String
轉換為Integer
。
最后,您的映射器將是:
public class FrequencyMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {
String line = lineText.toString();
if (line.contains("Product_Code,Warehouse")) {
return;
}
if (line.contains("Product")) {
String productcode = line.split(",")[0].trim();
int orderDemand = Integer.valueOf(line.split(",")[4].trim());
context.write(new Text(productcode), new IntWritable(orderDemand));
}
}
}
這是你的減速器:
public class FrequencyReducer extends Reducer<Text, IntWritable , Text, FloatWritable > {
public void reduce( Text productcode, Iterable<IntWritable> orderDemands, Context context)
throws IOException, InterruptedException {
float averageDemand = 0;
float count = 0;
for ( IntWritable orderDemand : orderDemands) {
averageDemand +=orderDemand.get();
count +=1;
}
float result = averageDemand / count;
context.write(productcode, new FloatWritable (result));
}
}
這是你的跑步者:
public class Frequency {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: Average <input path> <output path>");
System.exit(-1);
}
// create a Hadoop job and set the main class
Job job = Job.getInstance();
job.setJarByClass(Frequency.class);
job.setJobName("MA-Test Average");
// set the input and output path
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// set the Mapper and Reducer class
job.setMapperClass(FrequencyMapper.class);
job.setReducerClass(FrequencyReducer.class);
// specify the type of the output
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FloatWritable.class);
// run the job
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.