[英]Debugging hadoop in eclipse
Is it possible to debug Hadoop's source code in Eclipse?I'm not asking about the map reduce tasks. 是否可以在Eclipse中调试Hadoop的源代码?我不是在问map reduce任务。 I want to see which part of the Hadoop source code is responsible for scheduling the map reduce tasks and how it works. 我想看看Hadoop源代码的哪一部分负责安排map reduce任务及其工作方式。 Is there any mechanism by which it can be done? 有什么机制可以做到吗?
You can download Hadoop project and integrate it to your eclipse, and use F5 or F6 to debug. 您可以下载Hadoop项目并将其集成到Eclipse中,然后使用F5或F6进行调试。 You have different mode of debugging in eclipse: 您在Eclipse中有不同的调试模式:
Or you can try yourself to understand the workflow by following step by step, you can begin from your run()
method in your main. 或者,您可以通过逐步操作来尝试了解工作流,也可以从main中的run()
方法开始。
To answer your question: who does schedule the map task? 要回答您的问题:谁安排地图任务?
As you can see in this schema, files are divided by the InputFormat
class into fixed-size pieces called InputSplits. 如在该模式中看到的,文件由InputFormat
类划分为固定大小的片段,称为InputSplits。 Each split is then given to a mapper, which is a node that was assigned a map task. 然后将每个拆分分配给映射器,该映射器是被分配了映射任务的节点。
The same InputFormat
class also provides a RecordReader
responsible for parsing the split and extracting records.Each record is passed to a map function as a (key, value) pair. 相同的InputFormat
类还提供了一个RecordReader
负责解析拆分和提取记录,每个记录都以(键,值)对的形式传递给map函数。 So the Mapper
class is the one who call map
methods. 因此, Mapper
类是调用map
方法的类。
Here is the workflow of the wordcount example: 这是单词计数示例的工作流程:
Where the FileInputFormat is an abstract class that extends the abstract class InputFormat , and the TextInputFormat extends the FileInputFormat
class. 其中FileInputFormat是扩展抽象类InputFormat的抽象类,而TextInputFormat扩展FileInputFormat
类。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.