简体   繁体   English

是否可以在不进入 spark-shell 的情况下运行 Spark Scala 脚本?

[英]Is it possible to run a Spark Scala script without going inside spark-shell?

The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell.我知道运行基于 Scala 的 Spark 代码的唯一两种方法是将 Scala 程序编译成 jar 文件并使用 spark-submit 运行它,或者通过在 spark-shell 中使用 :load 运行 Scala 脚本。 My question is, it is possible to run a Scala file directly on the command line, without first going inside spark-shell and then issuing :load?我的问题是,可以直接在命令行上运行 Scala 文件,而无需先进入 spark-shell 然后发出 :load?

You can simply use the stdin redirection with spark-shell :您可以简单地将 stdin 重定向与spark-shell

spark-shell < YourSparkCode.scala

This command starts a spark-shell, interprets your YourSparkCode.scala line by line and quits at the end.此命令启动一个 spark-shell, YourSparkCode.scala解释YourSparkCode.scala并在最后退出。

Another option is to use -I <file> option of spark-shell command:另一种选择是使用spark-shell命令的-I <file>选项:

spark-shell -I YourSparkCode.scala

The only difference is that the latter command leaves you inside the shell and you must issue :quit command to close the session.唯一的区别是后一个命令将您留在 shell 中,您必须发出:quit命令来关闭会话。

[UDP] Passing parameters [UDP]传递参数

Since spark-shell does not execute your source as an application but just interprets your source file line by line, you cannot pass any parameters directly as application arguments.由于spark-shell不会将您的源代码作为应用程序执行,而只是逐行解释您的源文件,因此您不能将任何参数直接作为应用程序参数传递。

Fortunately, there may be a lot of options to approach the same (eg, externalizing the parameters in another file and read it in the very beginning in your script).幸运的是,可能有很多选项可以实现相同的目标(例如,将另一个文件中的参数外部化并在脚本的开头读取它)。

But I personally find the Spark configuration the most clean and convenient way.但我个人认为 Spark 配置是最干净和方便的方式。

Your pass your parameters via --conf option:您通过--conf选项传递参数:

spark-shell --conf spark.myscript.arg1=val1 --conf spark.yourspace.arg2=val2 < YourSparkCode.scala

(please note that spark. prefix in your property name is mandatory, otherwise Spark will discard your property as invalid) (请注意,属性名称中的spark.前缀是强制性的,否则 Spark 会将您的属性丢弃为无效)

And read these arguments in your Spark code as below:并在您的 Spark 代码中阅读这些参数,如下所示:

val arg1: String = spark.conf.get("spark.myscript.arg1")
val arg2: String = spark.conf.get("spark.myscript.arg2")

It is possible via spark-submit.可以通过 spark-submit 来实现。

https://spark.apache.org/docs/latest/submitting-applications.html https://spark.apache.org/docs/latest/submitting-applications.html

You can even put it to bash script either create sbt-task https://www.scala-sbt.org/1.x/docs/Tasks.html to run your code.你甚至可以把它放到 bash 脚本中或者创建 sbt-task https://www.scala-sbt.org/1.x/docs/Tasks.html来运行你的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM