[英]Run multiple Scala main classes in parallel with bash script
I have an application with Kafka data producer and Spark consumer, where KafkaProducer object extends App and SparkConsumer defined the main method. 我有一个带有Kafka数据生成器和Spark使用者的应用程序,其中KafkaProducer对象扩展了App,SparkConsumer定义了main方法。 I want to create a bash script so I could set which class to run - producer or consumer and run them in parallel.
我想创建一个bash脚本,以便可以设置要运行的类(生产者或使用者)并并行运行它们。 I have managed to create such a script, but sbt takes a while to load and I need to restart producer multiple times which takes much longer than just running the same class in IDE.
我已经设法创建了一个这样的脚本,但是sbt需要花费一些时间来加载,并且我需要多次重新启动生产者,这比在IDE中运行同一个类要花更长的时间。 Where can I move sbt command definition or which approach can I choose to decrease the time needed to run an application?
我可以在哪里移动sbt命令定义,或者可以选择哪种方法减少运行应用程序所需的时间?
PS I run both consumer and producer separately in different terminals. PS我在不同的终端中分别运行消费者和生产者。
Here is how my bash script looks like: 这是我的bash脚本的样子:
#!/usr/bin/env bash
if [ "$1" = "consumer" ]
then
sbt "runMain consumer.SparkConsumer $2 $3 $4"
elif [ "$1" = "producer" ]
then
sbt "runMain producer.KafkaProducer $5 $3 $6 $7"
else
echo "Wrong parameter. It should be consumer or producer"
fi
You have several options here: 您在这里有几种选择:
Maybe you don't know it, but your sbt compiles your Scala code into java bytecode (a .jar) file and then runs it using java
. 也许您不知道,但是您的sbt将Scala代码编译为Java字节码(.jar)文件,然后使用
java
运行它。 So you could do that directly yourself: 因此,您可以直接自己执行以下操作:
sbt package
to compile your code sbt package
来编译代码 java -cp "target/scala-<SCALA_VERSION>/<PROJECT_NAME>-<PROJECT_VERSION>.jar" your.main.class.Name
java -cp "target/scala-<SCALA_VERSION>/<PROJECT_NAME>-<PROJECT_VERSION>.jar" your.main.class.Name
( <SCALA_VERSION>
, <PROJECT_NAME>
, <PROJECT_VERSION>
and your.main.class.Name
have to be replaced with your own values) (
<SCALA_VERSION>
, <PROJECT_NAME>
, <PROJECT_VERSION>
和your.main.class.Name
必须替换为您自己的值)
This should allow you to start your command faster, as sbt takes a while to start. 这将使您更快地启动命令,因为sbt需要一段时间才能启动。 You will still have the overhead of the JVM starting time however, which leads me to the second solution:
但是,您仍然会有JVM启动时间的开销,这使我想到了第二个解决方案:
If you really need to start your commands quickly, then I suggest that you modify your scala program to be able to accept an arbitrary number of actions from the command line (or by reading a file) launch the commands in parallel directly in your scala code. 如果您确实需要快速启动命令,那么建议您修改scala程序以使其能够从命令行(或通过读取文件)接受任意数量的操作(直接通过scala代码并行启动命令) 。 This is as easy as this:
Seq(1, 2, 3, 4).par.foreach{println}
The .par
will create a ParSeq which is a sequence that can run in parallel. 这很简单:
Seq(1, 2, 3, 4).par.foreach{println}
.par
将创建一个ParSeq,它是可以并行运行的序列。 You can even configure the degree of parallelism, but that is another question. 您甚至可以配置并行度,但这是另一个问题。
Third option: you could have a look at https://github.com/facebook/nailgun (or any similar project ) and use it to reduce your JVM starting time overhead. 第三种选择:您可以看看https://github.com/facebook/nailgun (或任何类似的项目 ),并使用它来减少JVM启动时间的开销。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.