简体   繁体   中英

EMR Master SSH disallowed

To run Scala spark jobs and to run spark shell queries...I have been SSHing into the Masternode of EMR and now the team that is charged to manage the cloud is not allowing me to SSH into EMR masternode. What are the alternative patterns that could be leveraged?

Zeppelin is your best best if the cloud team are happy allowing access to it.

The %spark interpreter is pretty much spark-shell running in a notebook's paragraph.

It also imports important stuff like spark.sql , SparkContext etc. by default, so you don't need to import anything and can just run code eg:

%spark
val myDf = spark.sql(“select * from table”)
myDf.limit(10).show()

val myOtherDf = spark.read.csv(“s3://bucket/key/object.csv”)
myOtherDf.limit(10).show()

( spark-shell may do this too, but I don't use it enough to know off hand)

As Zeppelin is actually running on the Spark Master node, you can even get access to the master node's os using the shell interpreter %sh eg:

%sh
ls /
aws s3 cp s3://mybucket/myfile /

Although your access depends on the os permissions of course.

Be aware that once you kill the cluster, your notebook will disappear too! Make sure to download it when possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM