简体   繁体   English

在不使用PuTTy / SSH的情况下通过Python启动Hadoop MapReduce作业

[英]Launch Hadoop MapReduce job via Python without PuTTy/SSH

I have been running Hadoop MapReduce jobs by logging into SSH via PuTTy which requires that I enter Host Name/IP address, Login name and password into PuTTY in order to get the SSH command line window. 我通过通过PuTTy登录SSH来运行Hadoop MapReduce作业,这要求我在PuTTY中输入主机名/ IP地址,登录名和密码,以获取SSH命令行窗口。 Once in the SSH console window, I then provide the appropriate MR commands, such as: 在SSH控制台窗口中,然后提供适当的MR命令,例如:

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.0.1.jar -file /nfs_home/appers/user1/mapper.py -file /nfs_home/appers/user1/reducer.py -mapper '/usr/lib/python_2.7.3/bin/python mapper.py' -reducer '/usr/lib/python_2.7.3/bin/python reducer.py' -input /ccexp/data/test_xml/0901282-510179094535002-oozie-oozi-W/extractOut/ / .xml -output /user/ccexptest/output/user1/MRoutput hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.0.1.jar -file /nfs_home/appers/user1/mapper.py -file / nfs_home / appers /user1/reducer.py -mapper'/usr/lib/python_2.7.3/bin/python mapper.py'-reducer'/usr/lib/python_2.7.3/bin/python reducer.py'-input / ccexp / data / test_xml / 0901282-510179094535002-oozie-oozi-W / extractOut / / .xml -output / user / ccexptest / output / user1 / MRoutput

What I would like to do is use Python to change this clunky process so that I can launch the MapReduce job from within a Python script and avoid having to log into SSH via PuTTy. 我想做的是使用Python更改这个笨拙的过程,这样我就可以从Python脚本中启动MapReduce作业,而不必通过PuTTy登录SSH。

Can this be done and if so, can someone show me how? 可以这样做,如果可以,有人可以告诉我如何做吗?

I have solved this with the following script: 我用以下脚本解决了这个问题:

import paramiko

# Define connection info
host_ip = 'xx.xx.xx.xx'
user = 'xxxxxxxx'
pw = 'xxxxxxxx'

# Paths
input_loc = '/nfs_home/appers/extracts/*/*.xml'
output_loc = '/user/lcmsprod/output/cnielsen/'
python_path = "/usr/lib/python_2.7.3/bin/python"
hdfs_home = '/nfs_home/appers/cnielsen/'
output_log = r'C:\Users\cnielsen\Desktop\MR_Test\MRtest011316_0.txt'

# File names
xml_lookup_file = 'product_lookups.xml'
mapper = 'Mapper.py'
reducer = 'Reducer.py'
helper_script = 'Process.py'
product_name = 'test1'
output_ref = 'test65'

# ----------------------------------------------------

def buildMRcommand(product_name):
    space = " "
    mr_command_list = [ 'hadoop', 'jar', '/share/hadoop/tools/lib/hadoop-streaming.jar',
                        '-files', hdfs_home+xml_lookup_file,
                        '-file', hdfs_home+mapper,
                        '-file', hdfs_home+reducer,
                        '-mapper', "'"+python_path, mapper, product_name+"'",
                        '-file', hdfs_home+helper_script,
                        '-reducer', "'"+python_path, reducer+"'",
                        '-input', input_loc,
                        '-output', output_loc+output_ref]

    MR_command = space.join(mr_command_list)
    print MR_command
    return MR_command

# ----------------------------------------------------

def unbuffered_lines(f):
    line_buf = ""
    while not f.channel.exit_status_ready():
        line_buf += f.read(1)
        if line_buf.endswith('\n'):
            yield line_buf
            line_buf = ''

# ----------------------------------------------------

client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(host_ip, username=user, password=pw)

# Build Commands
list_dir = "ls "+hdfs_home+" -l"
getmerge = "hadoop fs -getmerge "+output_loc+output_ref+" "+hdfs_home+"test_011216_0.txt"

# Run Command
stdin, stdout, stderr = client.exec_command(list_dir)
##stdin, stdout, stderr = client.exec_command(buildMRcommand(product_name))
##stdin, stdout, stderr = client.exec_command(getmerge)

print "Executing command..."
writer = open(output_log, 'w')

for l in unbuffered_lines(stderr):
    e = '[stderr] ' + l
    print '[stderr] ' + l.strip('\n')
    writer.write(e)

for line in stdout:
    r = '[stdout]' + line
    print '[stdout]' + line.strip('\n')
    writer.write(r)

client.close()
writer.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM