繁体   English   中英

hadoop流,使用-libjars包含jar文件

[英]hadoop streaming, using -libjars to include jar files

我正在学习hadoop,并编写了map / reduce步骤来处理我的一些avro文件。 我认为我遇到的问题可能是由于我的hadoop安装。 我试图在我的笔记本电脑上以独立模式进行测试,而不是在分布式集群上进行测试。

这是我的bash调用来运行这个工作:

#!/bin/bash

reducer=/home/hduser/python-hadoop/test/reducer.py  
mapper=/home/hduser/python-hadoop/test/mapper.py
avrohdjar=/home/hduser/python-hadoop/test/avro-mapred-1.7.4-hadoop1.jar
avrojar=/home/hduser/hadoop/share/hadoop/tools/lib/avro-1.7.4.jar


hadoop jar ~/hadoop/share/hadoop/tools/lib/hadoop-streaming* \
  -D mapreduce.job.name="hd1" \
  -libjars ${avrojar},${avrohdjar} \ 
  -files   ${avrojar},${avrohdjar},${mapper},${reducer} \
  -input   ~/tmp/data/* \
  -output  ~/tmp/data-output \
  -mapper  ${mapper} \
  -reducer ${reducer} \
  -inputformat org.apache.avro.mapred.AvroAsTextInputFormat

这是输出:

15/04/23 11:02:54 INFO Configuration.deprecation: session.id is
deprecated. Instead, use dfs.metrics.session-id
15/04/23 11:02:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/04/23 11:02:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/04/23 11:02:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/home/hduser/tmp/mapred/staging/hduser1337717111/.staging/job_local1337717111_0001
15/04/23 11:02:54 ERROR streaming.StreamJob: Error launching job , bad input path : File does not exist: hdfs://localhost:54310/home/hduser/hadoop/share/hadoop/tools/lib/avro-1.7.4.jar
Streaming Command Failed!

我尝试了很多不同的修复,但不知道下一步该尝试什么。 由于某种原因,hadoop找不到-libjars指定的jar文件。 此外,我已成功运行此处发布的wordcount示例,因此我的hadoop安装或配置运行良好。 谢谢!

编辑以下是我的hdfs-site.xml内容的更改

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
   The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

这是core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

您的群集正在以分布式模式运行。 它试图在以下路径中查找输入,并且该路径不存在。

hdfs://localhost:54310/home/hduser/hadoop/share/hadoop/tools/lib/avro-1.7.4.jar

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM