[英]How should you run a jupyter notebook on Spark EMR Cluster
EDIT: This question was on how you should define parameters for python/jupyetr-notebook file in order to make a spark-submit on an EMR Amazon Spark Cluster...编辑:这个问题是关于如何为 python/jupyetr-notebook 文件定义参数,以便在 EMR Amazon Spark 集群上进行 spark-submit ...
Before: I am sorry for my dumb questions, but I am pretty newbie and I am stuck on the issue for a couple of days, and it seems there is no good guide on the web.之前:我很抱歉我的愚蠢问题,但我是个新手,我在这个问题上停留了几天,而且似乎没有关于 web 的好的指南。 I am following the Udacity Spark course.
我正在关注 Udacity Spark 课程。 I have created Spark Yarn cluster on Amazon AWS (EMR), with one master and 3 slaves.
我在 Amazon AWS (EMR) 上创建了 Spark Yarn 集群,有 1 个主节点和 3 个从节点。 I have created a jupyter notebook on top of that (and was able to run and see output using PySpark kernel).
我在此基础上创建了一个 jupyter 笔记本(并且能够使用 PySpark 内核运行并查看 output)。 I had connected using PuttY to the cluster (I guess to the master node), I have downloaded the jupyter notebook to the local machine.
我已经使用 PuttY 连接到集群(我猜是主节点),我已经将 jupyter notebook 下载到了本地机器上。 However, when I try to run it I am stuck consistently on many types of errors.
但是,当我尝试运行它时,我总是遇到许多类型的错误。 Currently, I run these commands:
目前,我运行这些命令:
/usr/bin/spark-submit --class "org.apache.spark.examples.SparkPi" --master yarn --deploy-mode cluster ./my-test-emr.ipynb 1>output-my-test-emr.log 2>error-my-test-emr.log
aws s3 cp ./error-my-test-emr.log s3://aws-emr-resources-750982214328-us-east-2/notebooks/e-8TP55R4K894W1BFRTNHUGJ90N/error-my-test-emr.log
I made both the error file and the jupyter notebook public so you can see them( link ).我将错误文件和 jupyter notebook 都公开了,这样你就可以看到它们( 链接)。 I truly suspect the --class parameter (I pretty much guessed it, and I have read about it as an option for my troubles but no further information was given), can anyone give me an explanation what is it?
我真的怀疑 --class 参数(我几乎猜到了,我已经阅读了它作为我的麻烦的一个选项,但没有给出进一步的信息),谁能给我解释它是什么? Why do we need it?
为什么我们需要它? And how can I find out/set the true value?
我怎样才能找出/设置真正的价值? If anyone has the will so further explanation about JAR would be helpful - why should I turn my python program into java?
如果有人愿意进一步解释 JAR 会有所帮助 - 为什么我要把我的 python 程序变成 java? And how should I do that?
我该怎么做? It seems like many questions have been asked here regarding it, but none explains it from the root...
似乎这里已经提出了很多关于它的问题,但没有人从根本上解释它......
Thanks in Advance提前致谢
.py
file..py
文件。--class
for a python script.--class
。.py
file, with some name, say test.py
, this will work.py
文件,有一些名字,比如test.py
,这将起作用spark-submit --master yarn --deploy-mode cluster ./test.py
When you mean locally, what version of Spark you downloaded and from where?当您的意思是本地时,您从哪里下载了什么版本的 Spark?
Generally, when I configure Spark in my laptop, I just run below command to run the Spark Pi example通常,当我在笔记本电脑中配置 Spark 时,我只需运行以下命令即可运行 Spark Pi 示例
spark-submit --class org.apache.spark.examples.SparkPi --master yarn \
--deploy-mode client SPARK_HOME/lib/spark-examples.jar 10
Where SPARK_HOME is the folder where you extract the tarball from the Spark website .其中 SPARK_HOME 是您从 Spark网站提取 tarball 的文件夹。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.