简体   繁体   English

在Eclipse中调试MapReduce(Hadoop 2.2或更高版本)

[英]Debug MapReduce (of Hadoop 2.2 or higher) in Eclipse

I am able to debug MapReduce (of Hadoop 1.2.1) in Eclipse by following the steps in http://www.thecloudavenue.com/2012/10/debugging-hadoop-mapreduce-program-in.html . 通过执行http://www.thecloudavenue.com/2012/10/debugging-hadoop-mapreduce-program-in.html中的步骤,我可以在Eclipse中调试(Hadoop 1.2.1的)MapReduce。 But how do I debug MapReduce (of Hadoop 2.2 or higher) in Eclipse? 但是,如何在Eclipse中调试MapReduce(Hadoop 2.2或更高版本)?

You can debug in same way. 您可以用相同的方式进行调试。 You just run you MapReduce code in standalone mode and use eclipse to debug MR code like any Java code. 您只需在独立模式下运行MapReduce代码,然后使用eclipse调试MR代码即可,就像任何Java代码一样。

Here are the steps I setup in Eclipse. 这是我在Eclipse中设置的步骤。 Environment: Ubuntu 16.04.2, Eclipse Neon.3 Release (4.6.3RC2), jdk1.8.0_121. 环境:Ubuntu 16.04.2,Eclipse Neon.3 Release(4.6.3RC2),jdk1.8.0_121。 I did a fresh hadoop-2.7.3 installation under /j01/srv/hadoop, which is my $HADOOP_HOME. 我在/ j01 / srv / hadoop下进行了全新的hadoop-2.7.3安装,这是我的$ HADOOP_HOME。 Replace $HADOOP_HOME value with your actual path wherever referenced below. 将$ HADOOP_HOME值替换为您在下面引用的实际路径。 For hadoop running from Eclipse, you do not need to do any hadoop configurations, what really needed is to pull the right set of hadoop jars into Eclipse. 对于从Eclipse运行的hadoop,您不需要进行任何hadoop配置,真正需要的是将正确的hadoop jar集放入Eclipse。

Step 1 Create new Java Project 步骤1建立新的Java专案
File > New > Project... 文件>新建>项目...
Select Java Project, Next 选择Java项目,下一步

创建新的Java项目

Enter Project name: hadoopmr 输入项目名称:hadoopmr

创建新的Java项目(继续)

Click Configure default... 单击配置默认...

在此处输入图片说明

Source folder name: src/main/java 源文件夹名称:src / main / java
Output folder name: target/classes 输出文件夹名称:目标/类
Click Apply, OK, then Next 单击“应用”,然后单击“确定”,然后单击“下一步”。
Click tab Libraries 单击标签库
添加外部Hadoop罐

Click Add External JARs... 单击添加外部JAR ...
Browse to hadoop installation folder, and add the following jars, when done click Finish 浏览到hadoop安装文件夹,并添加以下jar,完成后单击Finish。

$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.3.jar
$HADOOP_HOME/share/hadoop/common/hadoop-nfs-2.7.3.jar

$HADOOP_HOME/share/hadoop/common/lib/avro-1.7.4.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-collections-3.2.2.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-configuration-1.6.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-io-2.4.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-lang-2.6.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-logging-1.1.3.jar
$HADOOP_HOME/share/hadoop/common/lib/hadoop-auth-2.7.3.jar
$HADOOP_HOME/share/hadoop/common/lib/httpclient-4.2.5.jar
$HADOOP_HOME/share/hadoop/common/lib/httpcore-4.2.5.jar
$HADOOP_HOME/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar
$HADOOP_HOME/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar
$HADOOP_HOME/share/hadoop/common/lib/log4j-1.2.17.jar
$HADOOP_HOME/share/hadoop/common/lib/slf4j-api-1.7.10.jar
$HADOOP_HOME/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar

$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.3.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.3.jar
$HADOOP_HOME/share/hadoop/mapreduce/lib-examples/hsqldb-2.0.0.jar

$HADOOP_HOME/share/hadoop/tools/lib/guava-11.0.2.jar
$HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-api-2.7.3.jar
$HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-common-2.7.3.jar

Step 2 Create a MapReduce example 步骤2创建一个MapReduce示例
Create a new package: org.apache.hadoop.examples 创建一个新包:org.apache.hadoop.examples
Create WordCount.java under package org.apache.hadoop.examples with the following contents: 在org.apache.hadoop.examples包下创建具有以下内容的WordCount.java:

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length < 2) {
      System.err.println("Usage: wordcount <in> [<in>...] <out>");
      System.exit(2);
    }
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < otherArgs.length - 1; ++i) {
      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    }
    FileOutputFormat.setOutputPath(job,
      new Path(otherArgs[otherArgs.length - 1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Create input.txt under /home/hadoop/input/ (or your path) with the following contents: 在/ home / hadoop / input /(或您的路径)下创建具有以下内容的input.txt:

What do you mean by Object
What is Java Virtual Machine
How to create Java Object
How Java enabled High Performance

Step 3 Setup Debug Configuration 步骤3安装调试配置
In Eclipse, open WordCount.java, set breakpoints in places you like. 在Eclipse中,打开WordCount.java,在您喜欢的位置设置断点。
Right click on WordCount.java, Debug As > Debug Configurations... 右键单击WordCount.java,“调试为”>“调试配置” ...
Select Java Application, click New launch configuration on top-left icon 选择“ Java应用程序”,单击左上角的“新建启动配置”

调试配置

Enter org.apache.hadoop.examples.WordCount in Main class box 在“主类”框中输入org.apache.hadoop.examples.WordCount
Click Arguments tab 单击参数选项卡

在此处输入图片说明

enter 输入

/home/hadoop/input/input.txt /home/hadoop/output

into Program arguments 进入程序参数
Click Apply, then Debug 单击“应用”,然后单击“调试”
Program starts along with hadoop, it should hit the breakpoints you set. 程序与hadoop一起启动,它应该达到您设置的断点。

Check results at 检查结果

ls -l /home/hadoop/output
-rw-r--r-- 1 hadoop hadoop 131 Apr  5 22:59 part-r-00000
-rw-r--r-- 1 hadoop hadoop   0 Apr  5 22:59 _SUCCESS

Notes: 笔记:

1) If program does not run, make sure Project > Build Automatically is checked. 1)如果程序未运行,请确保选中“项目”>“自动生成”。
Project > Clean… to force a build 项目>清洁…强制构建

2) You can get more examples from 2)您可以从中获得更多示例

jar xvf $HADOOP_HOME/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.3-sources.jar

Copy them into this project to continue explore 将它们复制到该项目中以继续探索

3) You can download this eclipse project from 3)您可以从以下位置下载此Eclipse项目

git clone https://github.com/drachenrio/hadoopmr

In Eclipse, File > Import... > Existing Projects into Workspace > Next 在Eclipse中,文件>导入...>现有项目到工作区>下一步
Browse to cloned project and import it 浏览到克隆的项目并将其导入
Open .classpath, replace /j01/srv/hadoop-2.7.3 with your hadoop installation home 打开.classpath,将/j01/srv/hadoop-2.7.3替换为您的hadoop安装目录

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM