简体   繁体   English

hadoop:跟踪MapReduce任务

[英]hadoop: tracking MapReduce tasks

I'm new to hadoop and this is probably a stupid question but I've been looking for it for hours and cannot find how to do it. 我是hadoop的新手,这可能是一个愚蠢的问题,但是我已经寻找了好几个小时,却找不到如何做。

I'm running Hadoop MapReduce with a different number of mappers and reducers to see the difference in performance (eg execution time). 我正在运行具有不同数量的映射器和简化器的Hadoop MapReduce,以查看性能差异(例如执行时间)。 I want to check if the specified number of mappers/reducers were used but I just can't figure out how I do it. 我想检查是否使用了指定数量的映射器/缩小器,但我只是不知道该怎么做。

Hadoop 1.2.1 is installed on a quad-core machine with hyper-threading and I'm sshing to the server, and Hadoop is running in Pseudo-distributed mode. Hadoop 1.2.1安装在具有超线程功能的四核计算机上,我正向服务器发送数据,并且Hadoop以伪分布式模式运行。

My MapReduce program was written in Python, so I'm using hadoop-streaming, and this is how I ran the MR program. 我的MapReduce程序是用Python编写的,因此我正在使用hadoop流技术,这就是我运行MR程序的方式。

$ hadoop jar /Users/hadoop/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar 
-file /Users/hadoop/map.py 
-mapper /Users/hadoop/map.py 
-file /Users/hadoop/reduce.py 
-reducer /Users/hadoop/reduce.py 
-input file:///Users/hadoop/inputfile 
-output file:///Users/hadoop/outputfile

I want to see log information that looks like this , or anything that provides this kind of information. 我想查看看起来像这样的日志信息,或提供此类信息的任何内容。

You're looking for a service called the Resource Manager - this web interface includes links to logs like the one you've linked to in your question. 您正在寻找一种称为“资源管理器”的服务-此Web界面包含指向日志的链接,例如您在问题中链接到的日志。 This stackoverflow post has some answers about how to reach it. 这个stackoverflow帖子对如何到达有一些答案。 Given your version of hadoop, from the machine running hadoop you should be able to hit localhost:50030 to see the Resource Manager. 给定您的hadoop版本,在运行hadoop的计算机上,您应该能够访问localhost:50030来查看资源管理器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM