簡體   English   中英

每個產品的 order_demand 平均值為 output - MapReduce - Java

[英]average of order_demand for each product as output - MapReduce - Java

我是 mapreduce 主題的新手,仍處於學習階段。 我提前感謝您的幫助和進一步的提示。 在大學的練習中,我遇到了以下問題:從 csv 文件(下面作為示例列出)我想計算每個產品代碼的平均 order_demand。

下面顯示的代碼“FrequencyMapper”和“FreqeuencyReducer”正在我的服務器上運行,我認為我目前有 output 的顯示問題。 因為我是第一次開始使用 mapreduce,所以我很感激任何幫助。

下面列出的是映射器、減速器和驅動程序代碼。

數據集示例(csv 文件)

Product_Code,Warehouse,Product_Category,Date,Order_Demand
Product_0993,Whse_J,Category_028,2012/7/27,100
Product_0979,Whse_J,Category_028,2012/6/5,500 
Product_0979,Whse_E,Category_028,2012/11/29,500 
Product_1157,Whse_E,Category_006,2012/6/4,160000 
Product_1159,Whse_A,Category_006,2012/7/17,50000 

我的目標例如:

Product_0979   500
Product_1157   105000
...

頻率映射器.java:

package ma.test.a02;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class FrequencyMapper
  extends Mapper<LongWritable, Text, Text, IntWritable> {
 
 @Override
  public void map(LongWritable offset, Text lineText, Context context)
      throws IOException, InterruptedException {
     
    String line = lineText.toString();
    
    if(line.contains("Product")) {
        String productcode = line.split(",")[0];
        
        float orderDemand = Float.parseFloat(line.split(",")[4]);
        
        context.write(new Text(productcode), new IntWritable((int) orderDemand));
    }
  }
}

減速機.java:

package ma.test.a02;

import java.io.IOException;

import javax.xml.soap.Text;

import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;

public class FrequencyReducer extends Reducer< Text ,  IntWritable ,  IntWritable ,  FloatWritable > {
     public void reduce( IntWritable productcode,  Iterable<IntWritable> orderDemands,  Context context)
         throws IOException,  InterruptedException {
             
      float averageDemand  = 0;
      float count = 0;
      for ( IntWritable orderDemand : orderDemands) {
          
            averageDemand +=orderDemand.get();
            count +=1;
        }
      
      float result = averageDemand / count;
    
      context.write(productcode,  new FloatWritable (result));
    }
}

頻率.java(驅動器):

package ma.test.a02;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Frequency {
 
  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.err.println("Usage: Average <input path> <output path>");
      System.exit(-1);
    }
    
    // create a Hadoop job and set the main class
    Job job = Job.getInstance();
    job.setJarByClass(Frequency.class);
    job.setJobName("MA-Test Average");
    
    // set the input and output path
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    // set the Mapper and Reducer class
    job.setMapperClass(FrequencyMapper.class);
    job.setReducerClass(FrequencyReducer.class);
    
    // specify the type of the output
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(FloatWritable.class);
    
    // run the job
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

提示 1 :在映射器中,您過濾了以下行中包含“VOLUME”的行:

if(line.contains("VOLUME")) {

}

但是沒有一行包含“VOLUME”,所以你在減速器中沒有輸入!

提示 2 :您的減速器 output 值為FloatWritable ,您應該在跑步者中使用此行( Frequency類):

job.setOutputValueClass(FloatWritable.class);

而不是這個:

job.setOutputValueClass(IntWritable.class);

提示 3 :在減速器中更改此行:

public class FrequencyReducer extends Reducer<IntWritable ,  IntWritable ,  IntWritable ,  FloatWritable> 

對此:

public class FrequencyReducer extends Reducer<Text, IntWritable,  IntWritable, FloatWritable > 

還將這些行添加到Frequency class:

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

提示 4 :csv 文件中描述 csv 文件結構的第一行將導致問題。 通過將以下行放在您的 map 方法的第一行來拒絕此行:

if(line.contains("Product_Code,Warehouse")) {
    return;
}

提示 5 :在實際程序中,確保您有計划在orderDemand中不能將String轉換為Integer

最后,您的映射器將是:

public class FrequencyMapper
        extends Mapper<LongWritable, Text, Text, IntWritable> {

    @Override
    public void map(LongWritable offset, Text lineText, Context context)
            throws IOException, InterruptedException {

        String line = lineText.toString();

        if (line.contains("Product_Code,Warehouse")) {
            return;
        }

        if (line.contains("Product")) {
            String productcode = line.split(",")[0].trim();
            int orderDemand = Integer.valueOf(line.split(",")[4].trim());
            context.write(new Text(productcode), new IntWritable(orderDemand));
        }
    }
}

這是你的減速器:

public class FrequencyReducer extends Reducer<Text, IntWritable , Text, FloatWritable > {
    public void reduce( Text productcode,  Iterable<IntWritable> orderDemands,  Context context)
            throws IOException,  InterruptedException {

        float averageDemand  = 0;
        float count = 0;
        for ( IntWritable orderDemand : orderDemands) {

            averageDemand +=orderDemand.get();
            count +=1;
        }

        float result = averageDemand / count;

        context.write(productcode,  new FloatWritable (result));
    }
}

這是你的跑步者:

public class Frequency {

    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("Usage: Average <input path> <output path>");
            System.exit(-1);
        }
        
        // create a Hadoop job and set the main class
        Job job = Job.getInstance();
        job.setJarByClass(Frequency.class);
        job.setJobName("MA-Test Average");

        // set the input and output path
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // set the Mapper and Reducer class
        job.setMapperClass(FrequencyMapper.class);
        job.setReducerClass(FrequencyReducer.class);

        // specify the type of the output
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(FloatWritable.class);

        // run the job
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM