简体   繁体   中英

Implement a coordinator service for Hadoop

I want to implement a coordinator service which run in Namenode.

So that when a map task finished their tasks, it will send a feedback to namenode to inform that "Machine (xxxx) has processed block ID... belong to file ...". Namenode will maintain these information in table (for example).

I know this kind of question too general but actually now I got stuck at this.

Can I implement this function in Hadoop and how can I do that? Anyone can give me ideals or some similar tasks have done before?

You need a service that receive the notification and store it in some place (may be a rest service or a MQ), it doesn´t matter if this service run on the NameNode or in a server outside the cluster. Just to say that the NameNode is the most critical point in the cluster, I really don´t recommend to use it to deploy any additional service.

Then, you will require to override the cleanup Map´s method to send the "map task completed" message after the Map task has finished.

Or you can try the Hadoop ResourceManager API to see if there is the information that you are looking for and just poll that API instead to create a new one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM