简体   繁体   中英

Using Jupyter notebook on Spark on EMR

I am new to spark and AWS, I am trying to install Jupyter on my Spark cluster (EMR), i am not able to open Jupyter Notebook on my browser in the end.

Context: I have firewall issues from the place i am working, i can't get access to the EMR clsuter's IP address i create on a day-to-day basis. I have a dedicated EC-2 instance (IP address for this instance is white listed) that i am using as a client to connect to the EMR cluster i create on a need basis.

I have access to the IP address of the EC2 instance and the ports 22 and 8080. I do not have access to the IP address of EMR cluster.

Following are the steps that i am following:

  1. Open putty and connect to the EC2 instance
  2. Establish connection between my EC2 instance and EMR cluster ssh -i publickey.pem ec2-user@ host name of the EMR cluster
  3. install jupyter on the spark cluster using the following command: pip install jupyter

  4. Connect to spark: PYSPARK_DRIVER_PYTHON=/usr/local/bin/jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7777" pyspark --packages com.databricks:spark-csv_2.10:1.1.0 --master spark://127.0.0.1:7077 --executor-memory 6400M --driver-memory 6400M

  5. Establish a tunnel to browser: ssh -L 0.0.0.0:8080:127.0.0.1:7777 ip-172-31-34-209 -i publickey.pem

  6. open Jupyter on browser:

http:// host name of EMR cluster :8080

I am able to run the first 5 steps, but not able to open the Jupyter notebook on my browser.

Didn't test it, as it involves setting up a test EMR server, but here's what should work:

Step 5:

ssh -i publickkey.pem -L 8080:127.0.0.1:7777 HOSTNAME

Step 6:

Open jupyter notebook on browser using 127.0.0.1:8080

You can use an EMR notebook with Amazon EMR clusters running Apache Spark to remotely run queries and code. An EMR notebook is a "serverless" Jupyter notebook. EMR notebook sits outside the cluster and takes care of cluster attachment without you having to worry about it.

More information here: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM