简体   繁体   中英

Enable JMX in Hadoop Job

I want to enable JMX monitoring for my hadoop job ( not for the JobTracker, DataNode or something else, for the actual job ). I'm searching for a possibility where I can connect from my local machine to the host/cluster/node where the job is running using jconsole and retrieve some monitoring values. So I need remote access to JMX.

I tried to add some options to MAPRED_MAP_TASK_JAVA_OPTS and MAPRED_REDUCE_TASK_JAVA_OPTS :

  1. Adding -Dcom.sun.management.jmxremote.authenticate=fals-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.net.preferIPv4Stack=true does not help me as I do not know how to connect to JMX using jconsole . A port gets opened, but whenever I try to connect using jconsole , I get a "no such object in table" error.

  2. -Dcom.sun.management.jmxremote.authenticate=fals-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=1412 does work as expected: I can connect to hostname:1412 using jconsole . Main problem here: As the mapper/reducer might get run multiple times on the same node and the port cannot be used twice, the second time my mapper/reducer is started fails with an exception.

Is there any solution which allows me to use JMX in a hadoop job? This question is related to the other question I just asked which tries to formulate the problem at another level.

I think the answer to your question can be found here: https://forums.oracle.com/message/4798165#4798165

Basically, you can use -Dcom.sun.management.jmxremote.port=0 and then check the log to find the port the JMX Connector Server is listening on.

One of the tricks to doing this is that you have to change the log level of the root logger to CONFIG. Depending on the configuration of your cluster and your level of access, it may be difficult to modify the logging.properties file to do this.

Another route would be to specify java.util.logging.config.class to point to your own class to do the logging configuration. You can easily distribute such a class with your code. Then it should be easy to change the root log level without access to the local filesystem.

I have not actually tested this approach on a Hadoop cluster yet, but I expect it should work.

you have to setup your /etc/hadoop/hadoop-env.sh accordingly:

http://theholyjava.wordpress.com/2012/09/21/enabling-jmx-monitoring-for-hadoop-and-hive/

However, I wonder if this is the best way of doing this: if you want to observe specific behaviour, you might be better off isolating a specific input file and debugging against a local pseudo-cluster, and if you want system metrics you could do worse than give Ganglia a lookm as it is pretty mauch already built into hadoop:

http://wiki.apache.org/hadoop/GangliaMetrics

For those still looking for an answer, setting MAPRED_MAP_TASK_JAVA_OPTS config doesn't actually work. Instead, you will have to set this config on the program driver (Runner) so that it propagates it to the mappers/reducers:

configuration.set("mapreduce.map.java.opts", "-Xmx1600m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008");

Likewise, for the reducer, you can set the variable to mapreduce.reduce.java.opts .

To see if it worked. Just log into one of the machines and run ps aux | grep 8008 ps aux | grep 8008 to see if the JMX port was set correctly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM