简体   繁体   English

Hadoop上的外部/ jar文件存储在哪里?

[英]Where are external /jar files on Hadoop stored?

Lets say I write a WordCount example, and then in the eclipse project include an external jar file such as MyJar.jar. 假设我写了一个WordCount示例,然后在eclipse项目中包含一个外部jar文件,例如MyJar.jar。 Now if I export the whole WordCount project as a word.jar file, and then type 现在,如果我将整个WordCount项目导出为word.jar文件,然后键入

$> hadoop jar word.jar WordCount input output

I understand that the job executes and the word.jar will have a lib directory that contains MyJar.jar file. 我知道该作业会执行,并且word.jar将具有一个包含MyJar.jar文件的lib目录。 Now, where on the HDFS will this jar file MyJar file be stored when the job is running that makes calls to methods of this jar file? 现在,运行该调用此jar文件的方法的作业时,此jar文件MyJar文件将存储在HDFS的什么位置?

The bin/hadoop script actually unpacks your work.jar file into a tmp folder on the local file system. bin / hadoop脚本实际上将您的work.jar文件解压缩到本地文件系统上的tmp文件夹中。

The Job client handles the creation of a job folder in HDFS where your original jar, all the lib jars and other job files (such as the job.xml, distributed cache files etc) are uploaded to. Job客户端在HDFS中处理作业文件夹的创建,原始的jar,所有lib jar和其他作业文件(例如job.xml,分布式缓存文件等)都将被上传到该文件夹​​中。

When your job runs on a cluster node, these files are copied back down to a tmp job directory on the local file system of that node. 当作业在群集节点上运行时,这些文件将被复制回该节点本地文件系统上的tmp作业目录中。 For efficiency reasons the files are only copied down once, rather than for each map tasks which runs on that node. 出于效率原因,仅将文件复制一次,而不是针对在该节点上运行的每个映射任务复制一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM