I am new to spark and AWS, I am trying to install Jupyter on my Spark cluster (EMR), i am not able to open Jupyter Notebook on my browser in the end.
Context: I have firewall issues from the place i am working, i can't get access to the EMR clsuter's IP address i create on a day-to-day basis. I have a dedicated EC-2 instance (IP address for this instance is white listed) that i am using as a client to connect to the EMR cluster i create on a need basis.
I have access to the IP address of the EC2 instance and the ports 22 and 8080. I do not have access to the IP address of EMR cluster.
Following are the steps that i am following:
install jupyter on the spark cluster using the following command: pip install jupyter
Connect to spark: PYSPARK_DRIVER_PYTHON=/usr/local/bin/jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7777" pyspark --packages com.databricks:spark-csv_2.10:1.1.0 --master spark://127.0.0.1:7077 --executor-memory 6400M --driver-memory 6400M
Establish a tunnel to browser: ssh -L 0.0.0.0:8080:127.0.0.1:7777 ip-172-31-34-209 -i publickey.pem
open Jupyter on browser:
http:// host name of EMR cluster :8080
I am able to run the first 5 steps, but not able to open the Jupyter notebook on my browser.
Didn't test it, as it involves setting up a test EMR server, but here's what should work:
Step 5:
ssh -i publickkey.pem -L 8080:127.0.0.1:7777 HOSTNAME
Step 6:
Open jupyter notebook on browser using 127.0.0.1:8080
You can use an EMR notebook with Amazon EMR clusters running Apache Spark to remotely run queries and code. An EMR notebook is a "serverless" Jupyter notebook. EMR notebook sits outside the cluster and takes care of cluster attachment without you having to worry about it.
More information here: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.