如何使用JVisualVM以伪分布式模式监视Hadoop？

Question

I'm running Hadoop in pseudodistributed mode for testing on my local machine. 我正在以伪分布式模式运行Hadoop，以便在本地计算机上进行测试。 I'd like to monitor my mappers' and reducers' memory and CPU usage in JVisualVM. 我想在JVisualVM中监视映射器和缩减器的内存以及CPU使用情况。 However, in JVisualVM's list of local applications, I only see org.apache.hadoop.util.RunJar . 但是，在JVisualVM的本地应用程序列表中，我只看到org.apache.hadoop.util.RunJar 。

Are the mappers and reducers running as separate processes? 映射程序和化简程序是否作为单独的进程运行？ (In top , it looks like they are: two processes named "java" are using 100% CPU while my two mappers run.) If they are separate processes, why doesn't JVisualVM list them as applications that I can monitor? （在top ，看起来好像是这样：当两个映射程序运行时，两个名为“ java”的进程正在使用100％CPU。）如果它们是单独的进程，为什么JVisualVM不会将它们列为我可以监视的应用程序？
Are the mappers and reducers contained within the org.apache.hadoop.util.RunJar process? 映射器和化简器是否包含在org.apache.hadoop.util.RunJar进程中？ If so, (a) why do I only see Tool and ToolRunner in the JVisualVM Sampler, not any mapper/reducer code, and (b) why does JVisualVM report nearly 0% CPU when top reports 100%? 如果是这样，（a）为什么我只在JVisualVM Sampler中看到Tool and ToolRunner ，而不看到任何映射器/缩减器代码，并且（b）为什么当top报告100％时，JVisualVM报告近0％的CPU？

Is there some way I can modify my mappers/reducers so that JVisualVM can see them, at least while debugging in pseudodistributed mode? 有什么方法可以修改我的映射器/缩减器，以便JVisualVM至少在伪分布式模式下调试时可以看到它们？

For completeness, I should say that I'm running Hadoop 0.20 from Cloudera. 为了完整起见，我应该说我正在从Cloudera运行Hadoop 0.20。 (It was installed on Ubuntu using apt-get install hadoop-0.20-conf-pseudo from the http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh precise-cdh4 contrib repository. Even though Cloudera puts 2.x in the version number, it's not YARN, it's the original Hadoop.) （它是通过apt-get install hadoop-0.20-conf-pseudo从Ubuntu http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh precise-cdh4 contrib存储库安装在Ubuntu上的。即使Cloudera放置了2版本号中的.x，不是YARN，而是原始Hadoop。）

% hadoop version
Hadoop 2.0.0-cdh4.4.0
Subversion file:///var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/CDH4.4.0-Packaging-Hadoop-2013-09-03_18-48-35/hadoop-2.0.0+1475-1.cdh4.4.0.p0.23~precise/src/hadoop-common-project/hadoop-common -r c0eba6cd38c984557e96a16ccd7356b7de835e79
Compiled by jenkins on Tue Sep  3 19:33:54 PDT 2013
From source with checksum ac7e170aa709b3ace13dc5f775487180
This command was run using /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.4.0.jar

Answer 1

When you use hadoop jar [your_args] to start your application, actually the real command is java -jar org.apache.hadoop.util.RunJar [your_args] . 当您使用hadoop jar [your_args]启动应用程序时，实际的实际命令是java -jar org.apache.hadoop.util.RunJar [your_args] 。 So your driver which is used to start the MapReduce job is running in the process RunJar . 因此，用于启动MapReduce作业的驱动程序正在RunJar进程中RunJar 。

By default mappers and reducers run as separate processes. 默认情况下，映射器和化简器作为单独的进程运行。 You can not see it in JVisualVM is because JVisualVM does not have the correct permission. 在JVisualVM中看不到它是因为JVisualVM没有正确的权限。 Mappers and reducers are launched under the user mapred . 映射器和化简器在被mapred的用户下启动。 So if you want to use JVisualVM, you need to use sudo -E -u mapred jvisualvm . 因此，如果要使用JVisualVM，则需要使用sudo -E -u mapred jvisualvm 。

如何使用JVisualVM以伪分布式模式监视Hadoop？

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-09-29 05:52:54

如何使用JVisualVM以伪分布式模式监视Hadoop？

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-09-29 05:52:54

解决方案1
1 已采纳 2013-09-29 05:52:54