简体   繁体   中英

Job tracking URL in Google Compute engine not working

I am using Google Compute Engine to run Mapreduce jobs on Hadoop (pretty much all default configs). While running the job I get a tracking URL of the form http://PROJECT_NAME:8088/proxy/application_X_Y/ but it fails to open. Did I forget to configure something?

To elaborate on the option Amal mentioned in the other answer of using the "external ip address" of your Google Compute Engine VM, you can obtain the external IP address by running gcloud compute instances describe --zone <your zone> <your master hostname> and looking for natIP .

To open port 8088, you'll have to set up a firewall rule opening that port, likely on your default Google Compute Engine network. You'll want to specify a your.ip.address.here/32 address in the --source-ranges to restrict incoming traffic to just your local machine dialing into your VM, otherwise the anyone in the IP source-ranges would be able to access your Hadoop pages.

If you had used bdutil to turn up your cluster, there's an alternative way which is much easier and more secure; simply run

bdutil <your flags used in deployment, like -e hadoop2, --prefix, etc.> socksproxy

to open SSH with dynamic port forwarding to use as a SOCKS5 proxy that your browser can point to. If you're running on Linux or Mac and have Chrome or Firefox installed, bdutil should also print out a copy/paste command for starting a fresh isolated browser pre-configured to use the socks proxy so that you can click through all the useful links.

If bdutil didn't print out a browser command or you didn't use bdutil, you can also run and configure your SSH socks proxy using these instructions . An SSH-based socks proxy is more secure than opening up firewall ports, and also allows the Hadoop page links to work (otherwise you have to keep manually replacing the hostnames with the external IP addresses).

One correction. You are using YARN. So there is no jobtracker. Jobtracker is present in hadoop 1.x. In YARN, the processing layer became a generic framework and the jobtracker got replaced with Resource manager and application master. The UI that you mentioned in the question was of Resource Manager. For your problem, try the following tips.

Use the public ip address of the resource manager instance instead of PROJECT_NAME.

Check whether the 8088 port is opened for accessing it from outside.

Another (more secure) way to do this is to use gcloud compute to make an ssh tunnel to your deployment, and then launch Chrome though it.

$ gcloud compute ssh clustername --zone=us-central1-a --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n"

You will need to replace clustername with the name of your deployment, and change the --zone if necessary.

From there, you can launch Chrome through it and then reach the hadoop job tracking URL.

$ chrome   --proxy-server="socks5://localhost:1080" \
--host-resolver-rules="MAP * 0.0.0.0 , \ 
EXCLUDE localhost" --user-data-dir=/tmp/clustername

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM