简体繁体 English

hadoop jar命令执行

[英]hadoop jar command Execution

原文 2014-10-09 01:44:07 4 1 hadoop/ jar

We submit the jar file via hadoop jar command 我们通过hadoop jar命令提交jar文件
It hits the master node 它命中主节点

Will hadoop jar command copy the jar file to all slave nodes and start execution or how that works and who does that? hadoop jar命令会将jar文件复制到所有从属节点并开始执行或者如何工作以及由谁执行此操作？ the job tracker or Namenode ? 工作跟踪器或Namenode？

1 个解决方案

"hadoop jar" command tells hadoop to execute the job in cluster by providing jarfile, input paths, output paths. “hadoop jar”命令通过提供jarfile，输入路径，输出路径告诉hadoop在集群中执行作业。 jar file contains Job config and, all Map and Reduce code. jar文件包含Job配置，以及所有Map和Reduce代码。

Steps: 脚步：

Job Client Submits Job to JobTracker (JT). 作业客户端向JobTracker（JT）提交作业。 In the background, it copies the binaries, containing Config, Mapper and Reducer code, input and output paths information, to HDFS in a centralized area that is close to Task Trackers(TTs). 在后台，它将包含Config，Mapper和Reducer代码，输入和输出路径信息的二进制文件复制到靠近任务跟踪器（TT）的集中区域中的HDFS。 Once TTs need to use that code, they just download it locally on the datanode, so that when Map and Reduce tasks standup, they use that code to run on the local data. 一旦TT需要使用该代码，他们只需在datanode上本地下载它，这样当Map和Reduce任务站起来时，他们就会使用该代码在本地数据上运行。
JT queries the NameNode about Data Locations and Data Node names that contain the data. JT在NameNode中查询包含数据的数据位置和数据节点名称。
With this information, JT talks to TTs and creates an execution plan by considering the TTS that are closest to the data, if they have available slots for execution. 有了这些信息，JT会与TT进行对话，并通过考虑最接近数据的TTS来创建执行计划，如果它们有可用的执行槽。 If they are not, then it goes for rack locality and find any TTs in the same Rack have available slots. 如果不是，那么它适用于机架位置，并且发现同一机架中的任何TT都有可用的插槽。 If still couldn't find, then find TTs in any rack/any locality. 如果仍然找不到，那么在任何机架/任何地方找到TT。
Based on the execution plan, JT submits work to TTs. 根据执行计划，JT向TT提交工作。 TTs now standup Map and Reduce Tasks and execute on the data. TT现在支持Map和Reduce Tasks并执行数据。
TTs regularly report progress and heartbeats to JT(default every 5 sec). TT定期向JT报告进度和心跳（默认值为每5秒）。 Each of the Map and Reduce tasks report their progress/completion/error to JT through TT. 每个Map和Reduce任务都通过TT向JT报告其进度/完成/错误。 ie Mapper and Reducer tasks report to TT and TT reports to JT. 即Mapper和Reducer任务向JT报告TT和TT报告。 If MR tasks die then TTs will report this to JT, JT spins up other MR tasks to deal with errored. 如果MR任务死亡，那么TT会将此报告给JT，JT会旋转其他MR任务来处理错误。
Once all mappers done with their tasks, JT signals TTs to tell Reducers to run their reducer execution (ie run reduce() method) 一旦所有映射器完成了他们的任务，JT发出TT信号告诉Reducers运行他们的reducer执行（即运行reduce（）方法）
Once all Mappers and Reducers are finished, and final output is written, JT updates its status to SUCCESS and notifies the client. 完成所有Mapper和Reducers并写入最终输出后，JT会将其状态更新为SUCCESS并通知客户端。