简体   繁体   English

基本的hadoop mapreduce工作正在启动,但尚未完成

[英]Basic hadoop mapreduce job is starting, but not completing

I'm looking at Hadoop, but when I try to submit a mapreduce job Hadoop appears to start, but then hangs with no indication of progress, or other activity. 我正在查看Hadoop,但是当我尝试提交mapreduce作业时,Hadoop似乎启动了,但随后挂起,没有任何进度或其他活动的迹象。 The Application Status pages say it's been submitted and show the job, but nothing happens, and I'm curious of where to look to resolve this problem. “应用程序状态”页面说它已经提交并显示了工作,但是什么也没发生,我很好奇应该去哪里解决这个问题。

I'm using Hadoop version 2.7.1, installed in OS X 10.10.4 using Homebrew and Java 1.8.0_45. 我正在使用Hadoop版本2.7.1,该版本使用Homebrew和Java 1.8.0_45安装在OS X 10.10.4中。 I configured it per these instructions: https://datarecipe.wordpress.com/2015/06/05/setup-hadoop-2-6-on-mac-osx-10-9/ 我已按照以下说明对其进行了配置: https : //datarecipe.wordpress.com/2015/06/05/setup-hadoop-2-6-on-mac-osx-10-9/

The data is a simple text file called "purchases.txt" containing this (tab-delimited): 数据是一个简单的文本文件,称为“ purchases.txt”,其中包含以下内容(制表符分隔):

2013-03-29  2:30    miami   cup 2.43    visa
2013-04-23  1:34    miami   cup 2.43    visa
2013-04-23  10:15   LA  spoon   1.32    visa
2013-04-28  6:34    LA  bottle  3.56    cash
2013-05-23  1:43    miami   glass   3.21    visa

I've uploaded this into hadoop with (data folder already created): 我已将其上传到hadoop中(已创建数据文件夹):

hadoop fs -put purchases.txt /data/

I then set up the following mapper in python (per an online tutorial) and called it "mapper.py": 然后,我在python中设置了以下映射器(根据在线教程),并将其命名为“ mapper.py”:

import sys

def mapper():
    for line in sys.stdin:
        tempdata = line.strip().split("\n")
        for l in tempdata:
            if (len(l.split("\t")) == 6):
                date, time, store, item, cost, payment = l.split("\t")
                print("{0}\t{1}".format(store,cost))

def main():
    mapper()

if __name__=="__main__":
    main()

I did the same for the reducer code, and called it "reducer.py": 我对化简器代码进行了相同的操作,并将其称为“ reducer.py”:

import sys

def reducer():
    salesTotal = 0
    oldKey = None
    for line in sys.stdin:
        data = line.strip().split("\t")
        if len(data)!=2:
            continue
        thisKey, thisSale = data
        if oldKey and oldKey != thisKey:
            print("{0}\t{1}".format(oldKey,salesTotal))
            salesTotal=0
        oldKey = thisKey
        salesTotal+=float(thisSale)
    if oldKey != None:
        print("{0}\t{1}".format(oldKey,salesTotal))

def main():
    reducer()

if __name__=="__main__":
    main()

Testing these bits of code works on the command line: 测试这些代码位可在命令行上运行:

Tophers-Retina-MBP:Hadoop tkessler$ cat purchases.txt | ./mapper.py | sort | ./reducer.py 
LA  4.88
miami   5.640000000000001

However, when I run the stream process to run it in Hadoop, it just stalls here: 但是,当我运行流流程以在Hadoop中运行它时,它停滞在这里:

Tophers-Retina-MBP:lib tkessler$ hadoop jar ./hadoop-streaming-2.7.1.jar -mapper ~/PycharmProjects/Hadoop/mapper.py -reducer ~/PycharmProjects/Hadoop/reducer.py -file ~/PycharmProjects/Hadoop/mapper.py -input /data -output /project1out
packageJobJar: [/Users/tkessler/PycharmProjects/Hadoop/mapper.py, /var/folders/f_/3zvmc1g95lqgt1cp2dtnrtqw0000gp/T/hadoop-unjar2355518779286421017/] [] /var/folders/f_/3zvmc1g95lqgt1cp2dtnrtqw0000gp/T/streamjob8766144507660069606.jar tmpDir=null

It seems to start the job just fine, and accept the mapper and reducer, and running "mapred job -list all" shows the jobs are all running, but it never completes, and the status is just listed as "unknown". 似乎可以很好地启动该作业,并接受mapper和reducer,然后运行“ mapred job -list all”显示作业已全部运行,但从未完成,其状态仅列为“未知”。 I'm not sure if its a hadoop configuration issue, or some other problem, if anyone has any insight. 我不确定是否是Hadoop配置问题或其他问题(如果有人有任何见识)。

Addition: 加成:

When I run the following example command, the progress seems to just stop at the following line: 当我运行以下示例命令时,进度似乎只是停止在以下行:

Tophers-Retina-MBP:~ tkessler$ hadoop jar /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 4 1000
Number of Maps  = 4
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job

Cleared out Hadoop by shutting down the namenode and datanodes, and then uninstalled it with brew uninstall hadoop and then followed the instructions on this page for setting it up: http://amodernstory.com/2014/09/23/installing-hadoop-on-mac-osx-yosemite/ 通过关闭namenode和datanode清除Hadoop,然后使用brew uninstall hadoop卸载它,然后按照此页面上的说明进行设置: http : //amodernstory.com/2014/09/23/installing-hadoop- on-mac-osx-优胜美地/

Seems to be working great now, so perhaps it was just a slight configuration change (likely with the temporary file location), but it processes the mapper and reducer quite nicely now. 现在似乎工作得很好,所以也许只是轻微的配置更改(可能使用临时文件位置),但是现在它可以很好地处理mapper和reducer。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM