從映射器傳遞值時 Hadoop 減速器 ArrayIndexOutOfBoundsException

Question

我正在嘗試通過傳遞一個字符串值將 output 兩個值從映射器傳遞到減速器，但是當我在映射器中解析字符串時，我得到一個超出范圍的錯誤。 但是，我在 Mapper 中創建了字符串，所以我確定它有兩個值，我做錯了什么？ 如何將映射器中的兩個值傳遞給化簡器？ （最終，我需要將更多變量傳遞給減速器，但這會使問題變得更簡單。）

這是錯誤：

Error: java.lang.ArrayIndexOutOfBoundsException: 1
    at TotalTime$TimeReducer.reduce(TotalTime.java:57)
    at TotalTime$TimeReducer.reduce(TotalTime.java:1)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

這是我的代碼

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class TotalTime {
    
    public static class TimeMapper extends Mapper<Object, Text, Text, Text> {
        
        Text textKey = new Text();
        Text textValue = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

            String data = value.toString();
            String[] field = data.split(",");
            
            if (null != field && field.length == 4) {
                
                String strTimeIn[] = field[1].split(":"); 
                String strTimeOout[] = field[2].split(":");
                
                int timeOn = Integer.parseInt(strTimeIn[0]) * 3600 + Integer.parseInt(strTimeIn[1]) * 60 + Integer.parseInt(strTimeIn[2]);
                int timeOff = Integer.parseInt(strTimeOout[0]) * 3600 + Integer.parseInt(strTimeOout[1]) * 60 + Integer.parseInt(strTimeOout[2]);
                
                String v = String.valueOf(timeOn) + "," + String.valueOf(timeOff);
                
                textKey.set(field[0]); 
                textValue.set(v);
                
                context.write(textKey, textValue);
            }
        }
    }
    
    public static class TimeReducer extends Reducer<Text, Text, Text, Text> {

        public void reduce(Text key, Iterable<Text> values, Context context)    throws IOException, InterruptedException {
            
            Text textValue = new Text();
            int sumTime = 0;
            
            for (Text val : values) {

                String line = val.toString();
                // Split the string by commas
                String[] field = line.split(",");
                
                int timeOn = Integer.parseInt(field[0]);
                int timeOff = Integer.parseInt(field[1]);
                
                int time = timeOff - timeOn;
                    
                sumTime += time;

            }
            String v = String.valueOf(sumTime);
            
            textValue.set(v);
            context.write(key, textValue);
        }
    }
    
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        
        Job job = Job.getInstance(conf, "User Score");
        job.setJarByClass(TotalTime.class);
        job.setMapperClass(TimeMapper.class);
        job.setCombinerClass(TimeReducer.class);
        job.setReducerClass(TimeReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

}

輸入文件如下所示：

ID2347,15:40:51,16:21:44,20
ID4568,14:27:57,14:58:04,72
ID8755,13:40:49,13:42:31,99
ID3258,13:12:48,13:37:11,73
ID9666,13:44:34,15:53:36,114
ID8755,09:43:59,10:47:52,123
ID3258,10:25:22,10:41:12,14
ID9666,09:40:10,11:44:01,15

Answer 1

似乎是組合器導致您的代碼失敗。 記住 combiner 是在 reducer 之前運行的一段代碼。 現在想象一下這種情況：

你的映射器處理這一行：

ID2347,15:40:51,16:21:44,20

並將 output 寫入上下文

[ID2347, (56451,58904)]

現在組合器開始發揮作用並在減速器之前處理映射器的 output 並產生以下內容：

[ID2347, 2453]

現在在 go 行上方到減速器，它失敗了，因為在您的代碼中，您的假設是值類似於val1,val2如果您想要您的代碼工作，只需刪除組合器 [或更改您的邏輯]

從映射器傳遞值時 Hadoop 減速器 ArrayIndexOutOfBoundsException

問題描述

1 個解決方案

解決方案1
2 已采納 2020-12-17 11:51:08

從映射器傳遞值時 Hadoop 減速器 ArrayIndexOutOfBoundsException

問題描述

1 個解決方案

解決方案1 2 已采納 2020-12-17 11:51:08

解決方案1
2 已采納 2020-12-17 11:51:08