简体繁体 English

在Eclipse中调试Hadoop

[英]Debugging hadoop in eclipse

原文 2014-04-23 05:19:09 8 2 debugging/ hadoop/ mapreduce

Is it possible to debug Hadoop's source code in Eclipse?I'm not asking about the map reduce tasks. 是否可以在Eclipse中调试Hadoop的源代码？我不是在问map reduce任务。 I want to see which part of the Hadoop source code is responsible for scheduling the map reduce tasks and how it works. 我想看看Hadoop源代码的哪一部分负责安排map reduce任务及其工作方式。 Is there any mechanism by which it can be done? 有什么机制可以做到吗？

2 个解决方案

You can download Hadoop project and integrate it to your eclipse, and use F5 or F6 to debug. 您可以下载Hadoop项目并将其集成到Eclipse中，然后使用F5或F6进行调试。 You have different mode of debugging in eclipse: 您在Eclipse中有不同的调试模式：

F5 : Step by Step debugging F5：逐步调试
F6 : Skips loops and Subroutines F6：跳过循环和子例程
F7 : Skips the loop or subroutine and returns to the last cursor point. F7：跳过循环或子例程，并返回到最后一个光标点。
F8 : Execute and come out of debugging F8：执行并退出调试

Or you can try yourself to understand the workflow by following step by step, you can begin from your run() method in your main. 或者，您可以通过逐步操作来尝试了解工作流，也可以从main中的run()方法开始。

To answer your question: who does schedule the map task? 要回答您的问题：谁安排地图任务？

As you can see in this schema, files are divided by the InputFormat class into fixed-size pieces called InputSplits. 如在该模式中看到的，文件由InputFormat类划分为固定大小的片段，称为InputSplits。 Each split is then given to a mapper, which is a node that was assigned a map task. 然后将每个拆分分配给映射器，该映射器是被分配了映射任务的节点。

The same InputFormat class also provides a RecordReader responsible for parsing the split and extracting records.Each record is passed to a map function as a (key, value) pair. 相同的InputFormat类还提供了一个RecordReader负责解析拆分和提取记录，每个记录都以（键，值）对的形式传递给map函数。 So the Mapper class is the one who call map methods. 因此， Mapper类是调用map方法的类。

Here is the workflow of the wordcount example: 这是单词计数示例的工作流程：

在此处输入图片说明

Where the FileInputFormat is an abstract class that extends the abstract class InputFormat , and the TextInputFormat extends the FileInputFormat class. 其中FileInputFormat是扩展抽象类InputFormat的抽象类，而TextInputFormat扩展FileInputFormat类。

Here are instructions from Apache Hadoop documentation. 以下是Apache Hadoop文档中的说明。 I haven't tried them out, but the instructions are good enough to get started. 我还没有尝试过，但是说明已经足够入门。