簡體   English   中英

在hadoop集群上運行時MRJob錯誤

[英]MRJob error while running on hadoop cluster

我正在嘗試使用hadoop集群和MRJob運行python作業,我的包裝器腳本如下:

#!/bin/bash

. /etc/profile
module load use.own
module load python/python2.7
module load python/mrjob

python path_to_python-script/mr_word_freq_count.py path_to_input_file/input.txt  -r hadoop  `> path_to_output_file/output.txt       #note the output file already exists before I submit the job`

所以一旦我使用qsub myscript.sh將此腳本提交到集群

我得到兩個文件輸出文件和一個錯誤文件:

錯誤文件包含以下內容:

no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
Traceback (most recent call last):
  File "homefolder/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module>
    MRWordFreqCount.run()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
    mr_job.execute()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
    super(MRJob, self).execute()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
    self.run_job()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 206, in run_job
    with self.make_runner() as runner:
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 541, in make_runner
    return super(MRJob, self).make_runner()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 164, in make_runner
    return HadoopJobRunner(**self.hadoop_job_runner_kwargs())
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 179, in __init__
    super(HadoopJobRunner, self).__init__(**kwargs)
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/runner.py", line 352, in __init__
    self._opts = self.OPTION_STORE_CLASS(self.alias, opts, conf_paths)
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 132, in __init__
    'you must set $HADOOP_HOME, or pass in hadoop_home explicitly')
Exception: you must set $HADOOP_HOME, or pass in hadoop_home explicitly

第一個問題我如何找到$ HADOOP HOME? 當我回復$ HADOOP_HOME時,沒有打印任何內容,這意味着它沒有設置。 所以,即使我必須設置它,我必須設置它的路徑是什么? 它應該設置為群集中Hadoop name_node的路徑嗎?

第二個問題“沒有發現配置”的錯誤表明了什么? 它是否與$ HADOOP_HOME沒有設置有關,還是期望其他配置文件被明確傳入?

任何幫助將非常感激。

提前致謝!

首先, $HADOOP_HOME應設置為機器的本地hadoop安裝路徑 ,幾乎所有hadoop應用程序都假設$HADOOP_HOME/bin/hadoop是hadoop可執行文件。 因此,如果您在系統默認路徑中安裝hadoop,則應export HADOOP_HOME=/usr/ ,否則應將export HADOOP_HOME=/path/to/hadoop

其次,你可以為mrjob提供一個特定的配置,如果沒有,mrjob將使用auto-config。 在大多數情況下,提供HADOOP_HOME並使用auto-config很好,對於高級用戶,請參閱http://pythonhosted.org/mrjob/guides/configs-basics.html

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM