简体   繁体   English

EMR群集上未安装Spark

[英]Spark not installed on EMR cluster

I have been using Spark on an EMR cluster for a few weeks now without problems - the setup was with the AMI 3.8.0 and Spark 1.3.1, and I passed '-x' as an argument to Spark (without this it didn't seem to be installed). 我已经在EMR群集上使用Spark几周了,没有出现问题-设置是使用AMI 3.8.0和Spark 1.3.1,并且我将'-x'作为参数传递给Spark(没有它,它没有似乎已安装)。

I want to upgrade to a more recent version of Spark and today spun up a cluster with the emr-4.1.0 AMI, containing Spark 1.5.0. 我想升级到Spark的最新版本,今天用emr-4.1.0 AMI(包含Spark 1.5.0)启动了一个集群。 When the cluster is up it claims to have successfully installed Spark (at least on the cluster management page on AWS) but when I ssh into 'hadoop@[IP address]' I don't see anything in the 'hadoop' directory, where in the previous version Spark was installed (I've also tried with other applications and had the same result, and tried to ssh in as ec2-user but Spark is also not installed there). 当集群启动时,它声称已经成功安装了Spark(至少在AWS的集群管理页面上),但是当我SSH到“ hadoop @ [IP地址]”时,我在“ hadoop”目录中看不到任何内容在以前的版本中安装了Spark(我也尝试过使用其他应用程序,但结果相同,并尝试以ec2-user ssh身份登录,但那里也未安装Spark)。 When I spin up the cluster with the emr-4.1.0 AMI I don't have the option to pass the '-x' argument to Spark, and I'm wondering if there is something I'm missing. 当我使用emr-4.1.0 AMI启动集群时,我没有选择将'-x'参数传递给Spark,而且我想知道是否缺少某些东西。

Does anyone know what I'm doing wrong here? 有人知道我在做什么错吗?

Many thanks. 非常感谢。

This was actually solved, rather trivially. 这实际上是解决的,相当琐碎。

In the previous AMI all of the paths to Spark and other applications were soft links available in the hadoop folder. 在以前的AMI中,所有Spark和其他应用程序的路径都是hadoop文件夹中可用的软链接。 In the newer AMI these have been removed but the applications are still installed and can be accessed by 'spark-shell' (for example) at the command line. 在较新的AMI中,这些已删除,但仍安装了应用程序,并且可以在命令行中通过“ spark-shell”(例如)进行访问。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM