与bash脚本并行运行多个Scala主类

Question

I have an application with Kafka data producer and Spark consumer, where KafkaProducer object extends App and SparkConsumer defined the main method. 我有一个带有Kafka数据生成器和Spark使用者的应用程序，其中KafkaProducer对象扩展了App，SparkConsumer定义了main方法。 I want to create a bash script so I could set which class to run - producer or consumer and run them in parallel. 我想创建一个bash脚本，以便可以设置要运行的类（生产者或使用者）并并行运行它们。 I have managed to create such a script, but sbt takes a while to load and I need to restart producer multiple times which takes much longer than just running the same class in IDE. 我已经设法创建了一个这样的脚本，但是sbt需要花费一些时间来加载，并且我需要多次重新启动生产者，这比在IDE中运行同一个类要花更长的时间。 Where can I move sbt command definition or which approach can I choose to decrease the time needed to run an application? 我可以在哪里移动sbt命令定义，或者可以选择哪种方法减少运行应用程序所需的时间？

PS I run both consumer and producer separately in different terminals. PS我在不同的终端中分别运行消费者和生产者。

Here is how my bash script looks like: 这是我的bash脚本的样子：

#!/usr/bin/env bash
if [ "$1" = "consumer" ]
then
    sbt "runMain consumer.SparkConsumer $2 $3 $4"
elif [ "$1" = "producer" ]
then
    sbt "runMain producer.KafkaProducer $5 $3 $6 $7"
else
    echo "Wrong parameter. It should be consumer or producer"
fi

Answer 1

You have several options here: 您在这里有几种选择：

Maybe you don't know it, but your sbt compiles your Scala code into java bytecode (a .jar) file and then runs it using java . 也许您不知道，但是您的sbt将Scala代码编译为Java字节码（.jar）文件，然后使用java运行它。 So you could do that directly yourself: 因此，您可以直接自己执行以下操作：

run sbt package to compile your code 运行sbt package来编译代码
run your code with java -cp "target/scala-<SCALA_VERSION>/<PROJECT_NAME>-<PROJECT_VERSION>.jar" your.main.class.Name 使用java -cp "target/scala-<SCALA_VERSION>/<PROJECT_NAME>-<PROJECT_VERSION>.jar" your.main.class.Name

( <SCALA_VERSION> , <PROJECT_NAME> , <PROJECT_VERSION> and your.main.class.Name have to be replaced with your own values) （ <SCALA_VERSION> ， <PROJECT_NAME> ， <PROJECT_VERSION>和your.main.class.Name必须替换为您自己的值）

This should allow you to start your command faster, as sbt takes a while to start. 这将使您更快地启动命令，因为sbt需要一段时间才能启动。 You will still have the overhead of the JVM starting time however, which leads me to the second solution: 但是，您仍然会有JVM启动时间的开销，这使我想到了第二个解决方案：

If you really need to start your commands quickly, then I suggest that you modify your scala program to be able to accept an arbitrary number of actions from the command line (or by reading a file) launch the commands in parallel directly in your scala code. 如果您确实需要快速启动命令，那么建议您修改scala程序以使其能够从命令行（或通过读取文件）接受任意数量的操作（直接通过scala代码并行启动命令）。 This is as easy as this: Seq(1, 2, 3, 4).par.foreach{println} The .par will create a ParSeq which is a sequence that can run in parallel. 这很简单： Seq(1, 2, 3, 4).par.foreach{println} .par将创建一个ParSeq，它是可以并行运行的序列。 You can even configure the degree of parallelism, but that is another question. 您甚至可以配置并行度，但这是另一个问题。

Third option: you could have a look at https://github.com/facebook/nailgun (or any similar project ) and use it to reduce your JVM starting time overhead. 第三种选择：您可以看看https://github.com/facebook/nailgun （或任何类似的项目），并使用它来减少JVM启动时间的开销。

与bash脚本并行运行多个Scala主类

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-05-31 12:56:05

与bash脚本并行运行多个Scala主类

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-05-31 12:56:05

解决方案1
1 已采纳 2018-05-31 12:56:05