简体   繁体   中英

hadoop jar command Execution

  1. We submit the jar file via hadoop jar command
  2. It hits the master node

Will hadoop jar command copy the jar file to all slave nodes and start execution or how that works and who does that? the job tracker or Namenode ?

"hadoop jar" command tells hadoop to execute the job in cluster by providing jarfile, input paths, output paths. jar file contains Job config and, all Map and Reduce code.

Steps:

  1. Job Client Submits Job to JobTracker (JT). In the background, it copies the binaries, containing Config, Mapper and Reducer code, input and output paths information, to HDFS in a centralized area that is close to Task Trackers(TTs). Once TTs need to use that code, they just download it locally on the datanode, so that when Map and Reduce tasks standup, they use that code to run on the local data.

  2. JT queries the NameNode about Data Locations and Data Node names that contain the data.

  3. With this information, JT talks to TTs and creates an execution plan by considering the TTS that are closest to the data, if they have available slots for execution. If they are not, then it goes for rack locality and find any TTs in the same Rack have available slots. If still couldn't find, then find TTs in any rack/any locality.

  4. Based on the execution plan, JT submits work to TTs. TTs now standup Map and Reduce Tasks and execute on the data.

  5. TTs regularly report progress and heartbeats to JT(default every 5 sec). Each of the Map and Reduce tasks report their progress/completion/error to JT through TT. ie Mapper and Reducer tasks report to TT and TT reports to JT. If MR tasks die then TTs will report this to JT, JT spins up other MR tasks to deal with errored.

  6. Once all mappers done with their tasks, JT signals TTs to tell Reducers to run their reducer execution (ie run reduce() method)

  7. Once all Mappers and Reducers are finished, and final output is written, JT updates its status to SUCCESS and notifies the client.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM