![](/img/trans.png)
[英]getting error while running mrjob python scripting in hadoop cluster
[英]MRJob error while running on hadoop cluster
我正在尝试使用hadoop集群和MRJob运行python作业,我的包装器脚本如下:
#!/bin/bash
. /etc/profile
module load use.own
module load python/python2.7
module load python/mrjob
python path_to_python-script/mr_word_freq_count.py path_to_input_file/input.txt -r hadoop `> path_to_output_file/output.txt #note the output file already exists before I submit the job`
所以一旦我使用qsub myscript.sh将此脚本提交到集群
我得到两个文件输出文件和一个错误文件:
错误文件包含以下内容:
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
Traceback (most recent call last):
File "homefolder/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module>
MRWordFreqCount.run()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
mr_job.execute()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
self.run_job()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 206, in run_job
with self.make_runner() as runner:
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 541, in make_runner
return super(MRJob, self).make_runner()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 164, in make_runner
return HadoopJobRunner(**self.hadoop_job_runner_kwargs())
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 179, in __init__
super(HadoopJobRunner, self).__init__(**kwargs)
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/runner.py", line 352, in __init__
self._opts = self.OPTION_STORE_CLASS(self.alias, opts, conf_paths)
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 132, in __init__
'you must set $HADOOP_HOME, or pass in hadoop_home explicitly')
Exception: you must set $HADOOP_HOME, or pass in hadoop_home explicitly
第一个问题我如何找到$ HADOOP HOME? 当我回复$ HADOOP_HOME时,没有打印任何内容,这意味着它没有设置。 所以,即使我必须设置它,我必须设置它的路径是什么? 它应该设置为群集中Hadoop name_node的路径吗?
第二个问题“没有发现配置”的错误表明了什么? 它是否与$ HADOOP_HOME没有设置有关,还是期望其他配置文件被明确传入?
任何帮助将非常感激。
提前致谢!
首先, $HADOOP_HOME
应设置为机器的本地hadoop安装路径 ,几乎所有hadoop应用程序都假设$HADOOP_HOME/bin/hadoop
是hadoop可执行文件。 因此,如果您在系统默认路径中安装hadoop,则应export HADOOP_HOME=/usr/
,否则应将export HADOOP_HOME=/path/to/hadoop
其次,你可以为mrjob提供一个特定的配置,如果没有,mrjob将使用auto-config。 在大多数情况下,提供HADOOP_HOME
并使用auto-config很好,对于高级用户,请参阅http://pythonhosted.org/mrjob/guides/configs-basics.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.