[英]Using Jupyter notebook on Spark on EMR
I am new to spark and AWS, I am trying to install Jupyter on my Spark cluster (EMR), i am not able to open Jupyter Notebook on my browser in the end. 我是Spark和AWS的新手,我正在尝试在我的Spark集群(EMR)上安装Jupyter,我最终无法在浏览器上打开Jupyter Notebook。
Context: I have firewall issues from the place i am working, i can't get access to the EMR clsuter's IP address i create on a day-to-day basis. 背景:我从我正在工作的地方遇到防火墙问题,我无法访问我日常创建的EMR clsuter的IP地址。 I have a dedicated EC-2 instance (IP address for this instance is white listed) that i am using as a client to connect to the EMR cluster i create on a need basis. 我有一个专用的EC-2实例(这个实例的IP地址是白名单),我用它作为客户端连接到我根据需要创建的EMR集群。
I have access to the IP address of the EC2 instance and the ports 22 and 8080. I do not have access to the IP address of EMR cluster. 我可以访问EC2实例的IP地址以及端口22和8080.我无权访问EMR群集的IP地址。
Following are the steps that i am following: 以下是我遵循的步骤:
install jupyter on the spark cluster using the following command: pip install jupyter 使用以下命令在spark集群上安装jupyter:pip install jupyter
Connect to spark: PYSPARK_DRIVER_PYTHON=/usr/local/bin/jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7777" pyspark --packages com.databricks:spark-csv_2.10:1.1.0 --master spark://127.0.0.1:7077 --executor-memory 6400M --driver-memory 6400M 连接到spark:PYSPARK_DRIVER_PYTHON = / usr / local / bin / jupyter PYSPARK_DRIVER_PYTHON_OPTS =“notebook --no-browser --port = 7777”pyspark --packages com.databricks:spark-csv_2.10:1.1.0 --master spark ://127.0.0.1:7077 --executor-memory 6400M --driver-memory 6400M
Establish a tunnel to browser: ssh -L 0.0.0.0:8080:127.0.0.1:7777 ip-172-31-34-209 -i publickey.pem 建立到浏览器的隧道:ssh -L 0.0.0.0:8080:127.0.0.1:7777 ip-172-31-34-209 -i publickey.pem
open Jupyter on browser: 在浏览器上打开Jupyter:
http:// host name of EMR cluster :8080 http:// EMR集群的主机名 :8080
I am able to run the first 5 steps, but not able to open the Jupyter notebook on my browser. 我可以运行前5个步骤,但无法在浏览器上打开Jupyter笔记本。
Didn't test it, as it involves setting up a test EMR server, but here's what should work: 没有测试它,因为它涉及设置测试EMR服务器,但这是应该工作的:
Step 5: 第5步:
ssh -i publickkey.pem -L 8080:127.0.0.1:7777 HOSTNAME
Step 6: 第6步:
Open jupyter notebook on browser using 127.0.0.1:8080 使用127.0.0.1:8080在浏览器上打开jupyter笔记本
You can use an EMR notebook with Amazon EMR clusters running Apache Spark to remotely run queries and code. 您可以将EMR笔记本与运行Apache Spark的Amazon EMR集群一起使用,以远程运行查询和代码。 An EMR notebook is a "serverless" Jupyter notebook. EMR笔记本是一款“无需服务器”的Jupyter笔记本。 EMR notebook sits outside the cluster and takes care of cluster attachment without you having to worry about it. EMR笔记本电脑位于集群外部,可以处理集群附件,而无需担心。
More information here: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks.html 更多信息请访问: https : //docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.