简体   繁体   中英

MRJob error while running on hadoop cluster

I am trying to run a python job using a hadoop cluster and MRJob and my wrapper script is as follows:

#!/bin/bash

. /etc/profile
module load use.own
module load python/python2.7
module load python/mrjob

python path_to_python-script/mr_word_freq_count.py path_to_input_file/input.txt  -r hadoop  `> path_to_output_file/output.txt       #note the output file already exists before I submit the job`

so once I submit this script to the cluster using qsub myscript.sh

I get two files an output file and an error file:

The error file has the following content:

no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
Traceback (most recent call last):
  File "homefolder/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module>
    MRWordFreqCount.run()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
    mr_job.execute()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
    super(MRJob, self).execute()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
    self.run_job()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 206, in run_job
    with self.make_runner() as runner:
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 541, in make_runner
    return super(MRJob, self).make_runner()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 164, in make_runner
    return HadoopJobRunner(**self.hadoop_job_runner_kwargs())
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 179, in __init__
    super(HadoopJobRunner, self).__init__(**kwargs)
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/runner.py", line 352, in __init__
    self._opts = self.OPTION_STORE_CLASS(self.alias, opts, conf_paths)
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 132, in __init__
    'you must set $HADOOP_HOME, or pass in hadoop_home explicitly')
Exception: you must set $HADOOP_HOME, or pass in hadoop_home explicitly

First question how do I find the $HADOOP HOME? when I do echo $HADOOP_HOME nothing is printed which means it's not set. So even if I have to set it what is the path that I have to set it to? Should it be set to the path of the Hadoop name_node in the cluster?

Second question what does the "no configs found" error indicate? does it have to do with the $HADOOP_HOME not being set or does it expect some other config file to be explicitly passed in?

Any help would be really appreciated.

Thanks in advance!

First, $HADOOP_HOME should set to your machine's local hadoop installation path , almost all hadoop application assume that $HADOOP_HOME/bin/hadoop is the hadoop executable. So if you install your hadoop in system default path, you should export HADOOP_HOME=/usr/ , otherwise you should export HADOOP_HOME=/path/to/hadoop

Second, you can provide a specific config for mrjob, if not, mrjob will use auto-config. In most case, providing HADOOP_HOME and use auto-config is fine, for advanced users, please refer to http://pythonhosted.org/mrjob/guides/configs-basics.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM