Map 减速器错误 Output / 减速器不工作

[英]Map Reduce Wrong Output / Reducer not working

I'm trying to gather max and min temperature of a particular station and then finding the sum of temperature per different day but i keep getting an error in the mapper and Have tried a lot of other ways such as use stringtokenizer but same thing, i get an error.我正在尝试收集特定站点的最高和最低温度,然后找到不同天的温度总和,但我在映射器中不断收到错误,并且尝试了很多其他方法,例如使用 stringtokenizer,但同样的事情,我得到一个错误。

Sample Input.样本输入。

Station Date(YYYYMMDD) element temperature flag1 flat2 othervalue Station Date(YYYYMMDD) 元素温度 flag1 flat2 othervalue

i only need station, date(key), element and temperature from the input我只需要输入中的站、日期(键)、元素和温度

            public static class MaxMinMapper
                 extends Mapper<Object, Text, Text, IntWritable> {

               private Text newDate = new Text(); 

               public void map(Object key, Text value, Context context) throws 
                     InterruptedException {

                String stationID = "USW00003889";
                String[] tokens = value.toString().split(",");
                String station = "";
                String date = "";
                String element = "";
                int data = 0;

                station = tokens[0];
                date = tokens[1];
                element = tokens[2];
                data = Integer.parseInt(tokens[3]);

                if (stationID.equals(station) && ( element.equals("TMAX") || 
                       element.equals("TMIN")) ) {

                    context.write(newDate, new IntWritable(data));




        public static class MaxMinReducer
            extends Reducer<Text, Text, Text, IntWritable> {

             private IntWritable result = new IntWritable();
            public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

                int sumResult = 0;
                int val1 = 0;
                int val2 = 0;

                while (values.iterator().hasNext()) {

                        val1 = values.iterator().next().get();
                        val2 = values.iterator().next().get();
                        sumResult = val1 + val2;



                context.write(key, result);



Please help me out, thanks.请帮帮我,谢谢。

UPDATE: Verified each row with condition and changed data variable to String (change back to Integer -> IntWritable at later stage).更新:使用条件验证每一行并将数据变量更改为字符串(在稍后阶段更改回 Integer -> IntWritable)。

            if (tokens.length <= 5) {
                station = tokens[0];
                date = tokens[1];
                element = tokens[2];
                data = tokens[3];
                otherValue = tokens[4];
                station = tokens[0];
                date = tokens[1];
                element = tokens[2];
                data = tokens[3];
                otherValue = tokens[4];
                otherValue2 = tokens[5];

Update2: Ok i'm getting output written to file now but its the wrong output. Update2:好的,我现在正在将 output 写入文件,但它是错误的 output。 I need it to add the two values that have the same date (key) What am i doing wrong?我需要它来添加具有相同日期(键)的两个值我做错了什么?


20180101    -67
20180101    122
20180102    3
20180102    12
20180104    -177
20180104    -43
Desired Output
20180101    55
20180102    15
20180104    -220

This is the error i recieve aswell, even though i get output.这也是我收到的错误,即使我得到 output。

    ERROR: (gcloud.dataproc.jobs.submit.hadoop) Job [8e31c44ccd394017a4a28b3b16471aca] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at 'https://console.cloud.google.com/dataproc/jobs/8e31c44ccd394017a4a28b3b16471aca
?project=driven-airway-257512&region=us-central1' and in 'gs://dataproc-261a376e-7874-4151-b6b7-566c18758206-us-central1/google-cloud-dataproc-metainfo/f912a2f0-107f-40b6-94
    19/11/14 12:53:24 INFO client.RMProxy: Connecting to ResourceManager at cluster-1e8f-m/
19/11/14 12:53:25 INFO client.AHSProxy: Connecting to Application History server at cluster-1e8f-m/
19/11/14 12:53:26 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/11/14 12:53:26 INFO input.FileInputFormat: Total input files to process : 1
19/11/14 12:53:26 INFO mapreduce.JobSubmitter: number of splits:1
19/11/14 12:53:26 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/11/14 12:53:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1573654432484_0035
19/11/14 12:53:27 INFO impl.YarnClientImpl: Submitted application application_1573654432484_0035
19/11/14 12:53:27 INFO mapreduce.Job: The url to track the job: http://cluster-1e8f-m:8088/proxy/application_1573654432484_0035/
19/11/14 12:53:27 INFO mapreduce.Job: Running job: job_1573654432484_0035
19/11/14 12:53:35 INFO mapreduce.Job: Job job_1573654432484_0035 running in uber mode : false
19/11/14 12:53:35 INFO mapreduce.Job:  map 0% reduce 0%
19/11/14 12:53:41 INFO mapreduce.Job:  map 100% reduce 0%
19/11/14 12:53:52 INFO mapreduce.Job:  map 100% reduce 20%
19/11/14 12:53:53 INFO mapreduce.Job:  map 100% reduce 40%
19/11/14 12:53:54 INFO mapreduce.Job:  map 100% reduce 60%
19/11/14 12:53:56 INFO mapreduce.Job:  map 100% reduce 80%
19/11/14 12:53:57 INFO mapreduce.Job:  map 100% reduce 100%
19/11/14 12:53:58 INFO mapreduce.Job: Job job_1573654432484_0035 completed successfully
19/11/14 12:53:58 INFO mapreduce.Job: Counters: 55
    File System Counters
        FILE: Number of bytes read=120
        FILE: Number of bytes written=1247665
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        GS: Number of bytes read=846
        GS: Number of bytes written=76
        GS: Number of read operations=0
        GS: Number of large read operations=0
        GS: Number of write operations=0
        HDFS: Number of bytes read=139
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=1
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Killed reduce tasks=1
        Launched map tasks=1
        Launched reduce tasks=5
        Rack-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=17348
        Total time spent by all reduces in occupied slots (ms)=195920
        Total time spent by all map tasks (ms)=4337
        Total time spent by all reduce tasks (ms)=48980
        Total vcore-milliseconds taken by all map tasks=4337
        Total vcore-milliseconds taken by all reduce tasks=48980
        Total megabyte-milliseconds taken by all map tasks=8882176
        Total megabyte-milliseconds taken by all reduce tasks=100311040
    Map-Reduce Framework
        Map input records=25
        Map output records=6
        Map output bytes=78
        Map output materialized bytes=120
        Input split bytes=139
        Combine input records=0
        Combine output records=0
        Reduce input groups=3
        Reduce shuffle bytes=120
        Reduce input records=6
        Reduce output records=6
        Spilled Records=12
        Shuffled Maps =5
        Failed Shuffles=0
        Merged Map outputs=5
        GC time elapsed (ms)=1409
        CPU time spent (ms)=6350
        Physical memory (bytes) snapshot=1900220416
        Virtual memory (bytes) snapshot=21124952064
        Total committed heap usage (bytes)=1492123648
    Shuffle Errors
    File Input Format Counters 
        Bytes Read=846
    File Output Format Counters 
        Bytes Written=76
Job output is complete

Update 3:更新 3:

I updated the Reducer (after what LowKey said) and its giving me the same as output as above.我更新了减速器(在 LowKey 所说的之后),它给了我与上面的 output 相同的内容。 It's not doing the addition I want it to do.它没有做我想要做的添加。 It's completely ignoring that operation.它完全忽略了该操作。 Why?为什么?

    public static class MaxMinReducer
            extends Reducer<Text, Text, Text, IntWritable> {

             public IntWritable result = new IntWritable();

             public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

                int value = 0;
                int sumResult = 0;
                Iterator<IntWritable> iterator = values.iterator();

                while (values.iterator().hasNext()) {

                    value = iterator.next().get();

                        sumResult = sumResult + value;


                context.write(key, result);


Update 4: Adding my imports and driver class to work out why my reducer won't run?更新 4:添加我的导入和驱动程序 class 以找出我的减速器无法运行的原因?

package mapreduceprogram;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
        public static void main(String[] args) throws Exception {

            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf, "tempmin");
            FileInputFormat.addInputPath(job, new Path(args[1]));
            FileOutputFormat.setOutputPath(job, new Path (args[2]));
            System.exit(job.waitForCompletion(true) ? 0 : 1);


Anything wrong with it, as to why my reducer class isn't running?有什么问题吗,为什么我的减速机 class 没有运行?

Are those columns separated by tabs?这些列是否由制表符分隔? If yes, then don't expect to find a space character in there.如果是,那么不要指望在那里找到空格字符。

What are you doing wrong?你在做什么错? Well, for one thing, why do you have:好吧,一方面,你为什么有:

final int missing = -9999;

That doesn't make any sense.这没有任何意义。

Below that, you have some code that apparently is supposed to add two values, but it seems like you are accidentally throwing away items from your list.在此之下,您有一些显然应该添加两个值的代码,但您似乎不小心从列表中丢弃了项目。 See where you have:看看你在哪里:

if (values.iterator().next().get() != missing)

well... you never saved the value, so that means you threw it away.嗯......你从来没有保存过价值,所以这意味着你把它扔掉了。

Another problem is that you are adding incorrectly... For some reason you are trying to add two values for every iteration of the loop.另一个问题是您添加不正确......由于某种原因,您试图为循环的每次迭代添加两个值。 You should be adding one, so your loop should look like this:您应该添加一个,因此您的循环应如下所示:

IntWritable value = null;
Iterator iterator = values.iterator();
while (values.iterator().hasNext()) {
  value = iterator.next().get();
  if (value != missing){
    sumResult = sumResult + value;

The next obvious problem is that you put your output line inside your while loop:下一个明显的问题是您将 output 行放在您的 while 循环中:

while (values.iterator().hasNext()) {
  context.write(key, result);

That means that every time you read an item into your reducer, you write an item out.这意味着每次你将一个项目读入你的减速器时,你都会写出一个项目。 I think you what are trying to do is read in all the items for a given key, and then write a single reduced value (the sum).我认为您正在尝试做的是读取给定键的所有项目,然后写入一个减少的值(总和)。 In that case, you shouldn't have your output inside the loop.在这种情况下,您不应该将 output 放在循环内。 It should be after.应该是之后。

while ([...]) {

context.write(key, result);

LowKeyEnergy, helped me answer my question, check main post updates and comments. LowKeyEnergy,帮我回答了我的问题,检查主要帖子更新和评论。

