简体繁体 English

为Hadoop实施协调器服务

[英]Implement a coordinator service for Hadoop

原文 2016-06-21 08:07:53 8 1 java/ hadoop/ apache-zookeeper

I want to implement a coordinator service which run in Namenode. 我想实现在Namenode中运行的协调器服务。

So that when a map task finished their tasks, it will send a feedback to namenode to inform that "Machine (xxxx) has processed block ID... belong to file ...". 这样，当映射任务完成任务时，它将向namenode发送反馈，以告知“机器（xxxx）已处理块ID ...属于文件...”。 Namenode will maintain these information in table (for example). Namenode将在表中维护这些信息（例如）。

I know this kind of question too general but actually now I got stuck at this. 我知道这种问题太笼统，但实际上我现在陷入了困境。

Can I implement this function in Hadoop and how can I do that? 我可以在Hadoop中实现此功能吗？ Anyone can give me ideals or some similar tasks have done before? 任何人都可以给我理想或以前做过的类似任务？

1 个解决方案

You need a service that receive the notification and store it in some place (may be a rest service or a MQ), it doesn´t matter if this service run on the NameNode or in a server outside the cluster. 您需要一个接收通知并将其存储在某个位置的服务（可能是Rest服务或MQ），该服务是否运行在NameNode或群集之外的服务器上都没有关系。 Just to say that the NameNode is the most critical point in the cluster, I really don´t recommend to use it to deploy any additional service. 只是说NameNode是集群中最关键的一点，我真的不建议您使用它来部署任何其他服务。

Then, you will require to override the cleanup Map´s method to send the "map task completed" message after the Map task has finished. 然后，您将需要覆盖清除 Map的方法，以在Map任务完成后发送“地图任务已完成”消息。

Or you can try the Hadoop ResourceManager API to see if there is the information that you are looking for and just poll that API instead to create a new one. 或者，您可以尝试使用Hadoop ResourceManager API来查看是否有您要查找的信息，而只需轮询该API即可创建一个新的信息。