每個產品的 order_demand 平均值為 output - MapReduce - Java

Question

我是 mapreduce 主題的新手，仍處於學習階段。 我提前感謝您的幫助和進一步的提示。 在大學的練習中，我遇到了以下問題：從 csv 文件（下面作為示例列出）我想計算每個產品代碼的平均 order_demand。

下面顯示的代碼“FrequencyMapper”和“FreqeuencyReducer”正在我的服務器上運行，我認為我目前有 output 的顯示問題。 因為我是第一次開始使用 mapreduce，所以我很感激任何幫助。

下面列出的是映射器、減速器和驅動程序代碼。

數據集示例（csv 文件）

Product_Code,Warehouse,Product_Category,Date,Order_Demand
Product_0993,Whse_J,Category_028,2012/7/27,100
Product_0979,Whse_J,Category_028,2012/6/5,500 
Product_0979,Whse_E,Category_028,2012/11/29,500 
Product_1157,Whse_E,Category_006,2012/6/4,160000 
Product_1159,Whse_A,Category_006,2012/7/17,50000

我的目標例如：

Product_0979   500
Product_1157   105000
...

頻率映射器.java：

package ma.test.a02;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class FrequencyMapper
  extends Mapper<LongWritable, Text, Text, IntWritable> {
 
 @Override
  public void map(LongWritable offset, Text lineText, Context context)
      throws IOException, InterruptedException {
     
    String line = lineText.toString();
    
    if(line.contains("Product")) {
        String productcode = line.split(",")[0];
        
        float orderDemand = Float.parseFloat(line.split(",")[4]);
        
        context.write(new Text(productcode), new IntWritable((int) orderDemand));
    }
  }
}

減速機.java：

package ma.test.a02;

import java.io.IOException;

import javax.xml.soap.Text;

import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;

public class FrequencyReducer extends Reducer< Text ,  IntWritable ,  IntWritable ,  FloatWritable > {
     public void reduce( IntWritable productcode,  Iterable<IntWritable> orderDemands,  Context context)
         throws IOException,  InterruptedException {
             
      float averageDemand  = 0;
      float count = 0;
      for ( IntWritable orderDemand : orderDemands) {
          
            averageDemand +=orderDemand.get();
            count +=1;
        }
      
      float result = averageDemand / count;
    
      context.write(productcode,  new FloatWritable (result));
    }
}

頻率.java（驅動器）：

package ma.test.a02;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Frequency {
 
  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.err.println("Usage: Average <input path> <output path>");
      System.exit(-1);
    }
    
    // create a Hadoop job and set the main class
    Job job = Job.getInstance();
    job.setJarByClass(Frequency.class);
    job.setJobName("MA-Test Average");
    
    // set the input and output path
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    // set the Mapper and Reducer class
    job.setMapperClass(FrequencyMapper.class);
    job.setReducerClass(FrequencyReducer.class);
    
    // specify the type of the output
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(FloatWritable.class);
    
    // run the job
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Answer 1

提示 1 ：在映射器中，您過濾了以下行中包含“VOLUME”的行：

if(line.contains("VOLUME")) {

}

但是沒有一行包含“VOLUME”，所以你在減速器中沒有輸入！

提示 2 ：您的減速器 output 值為FloatWritable ，您應該在跑步者中使用此行（ Frequency類）：

job.setOutputValueClass(FloatWritable.class);

而不是這個：

job.setOutputValueClass(IntWritable.class);

提示 3 ：在減速器中更改此行：

public class FrequencyReducer extends Reducer<IntWritable ,  IntWritable ,  IntWritable ,  FloatWritable>

對此：

public class FrequencyReducer extends Reducer<Text, IntWritable,  IntWritable, FloatWritable >

還將這些行添加到Frequency class：

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

提示 4 ：csv 文件中描述 csv 文件結構的第一行將導致問題。 通過將以下行放在您的 map 方法的第一行來拒絕此行：

if(line.contains("Product_Code,Warehouse")) {
    return;
}

提示 5 ：在實際程序中，確保您有計划在orderDemand中不能將String轉換為Integer 。

最后，您的映射器將是：

public class FrequencyMapper
        extends Mapper<LongWritable, Text, Text, IntWritable> {

    @Override
    public void map(LongWritable offset, Text lineText, Context context)
            throws IOException, InterruptedException {

        String line = lineText.toString();

        if (line.contains("Product_Code,Warehouse")) {
            return;
        }

        if (line.contains("Product")) {
            String productcode = line.split(",")[0].trim();
            int orderDemand = Integer.valueOf(line.split(",")[4].trim());
            context.write(new Text(productcode), new IntWritable(orderDemand));
        }
    }
}

這是你的減速器：

public class FrequencyReducer extends Reducer<Text, IntWritable , Text, FloatWritable > {
    public void reduce( Text productcode,  Iterable<IntWritable> orderDemands,  Context context)
            throws IOException,  InterruptedException {

        float averageDemand  = 0;
        float count = 0;
        for ( IntWritable orderDemand : orderDemands) {

            averageDemand +=orderDemand.get();
            count +=1;
        }

        float result = averageDemand / count;

        context.write(productcode,  new FloatWritable (result));
    }
}

這是你的跑步者：

public class Frequency {

    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("Usage: Average <input path> <output path>");
            System.exit(-1);
        }
        
        // create a Hadoop job and set the main class
        Job job = Job.getInstance();
        job.setJarByClass(Frequency.class);
        job.setJobName("MA-Test Average");

        // set the input and output path
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // set the Mapper and Reducer class
        job.setMapperClass(FrequencyMapper.class);
        job.setReducerClass(FrequencyReducer.class);

        // specify the type of the output
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(FloatWritable.class);

        // run the job
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

每個產品的 order_demand 平均值為 output - MapReduce - Java

問題描述

1 個解決方案

解決方案1
0 已采納 2021-02-22 14:51:38

每個產品的 order_demand 平均值為 output - MapReduce - Java

問題描述

1 個解決方案

解決方案1 0 已采納 2021-02-22 14:51:38

解決方案1
0 已采納 2021-02-22 14:51:38