简体   繁体   English

与bash脚本并行运行多个Scala主类

[英]Run multiple Scala main classes in parallel with bash script

I have an application with Kafka data producer and Spark consumer, where KafkaProducer object extends App and SparkConsumer defined the main method. 我有一个带有Kafka数据生成器和Spark使用者的应用程序,其中KafkaProducer对象扩展了App,SparkConsumer定义了main方法。 I want to create a bash script so I could set which class to run - producer or consumer and run them in parallel. 我想创建一个bash脚本,以便可以设置要运行的类(生产者或使用者)并并行运行它们。 I have managed to create such a script, but sbt takes a while to load and I need to restart producer multiple times which takes much longer than just running the same class in IDE. 我已经设法创建了一个这样的脚本,但是sbt需要花费一些时间来加载,并且我需要多次重新启动生产者,这比在IDE中运行同一个类要花更长的时间。 Where can I move sbt command definition or which approach can I choose to decrease the time needed to run an application? 我可以在哪里移动sbt命令定义,或者可以选择哪种方法减少运行应用程序所需的时间?

PS I run both consumer and producer separately in different terminals. PS我在不同的终端中分别运行消费者和生产者。

Here is how my bash script looks like: 这是我的bash脚本的样子:

#!/usr/bin/env bash
if [ "$1" = "consumer" ]
then
    sbt "runMain consumer.SparkConsumer $2 $3 $4"
elif [ "$1" = "producer" ]
then
    sbt "runMain producer.KafkaProducer $5 $3 $6 $7"
else
    echo "Wrong parameter. It should be consumer or producer"
fi

You have several options here: 您在这里有几种选择:

Maybe you don't know it, but your sbt compiles your Scala code into java bytecode (a .jar) file and then runs it using java . 也许您不知道,但是您的sbt将Scala代码编译为Java字节码(.jar)文件,然后使用java运行它。 So you could do that directly yourself: 因此,您可以直接自己执行以下操作:

  • run sbt package to compile your code 运行sbt package来编译代码
  • run your code with java -cp "target/scala-<SCALA_VERSION>/<PROJECT_NAME>-<PROJECT_VERSION>.jar" your.main.class.Name 使用java -cp "target/scala-<SCALA_VERSION>/<PROJECT_NAME>-<PROJECT_VERSION>.jar" your.main.class.Name

( <SCALA_VERSION> , <PROJECT_NAME> , <PROJECT_VERSION> and your.main.class.Name have to be replaced with your own values) <SCALA_VERSION><PROJECT_NAME><PROJECT_VERSION>your.main.class.Name必须替换为您自己的值)

This should allow you to start your command faster, as sbt takes a while to start. 这将使您更快地启动命令,因为sbt需要一段时间才能启动。 You will still have the overhead of the JVM starting time however, which leads me to the second solution: 但是,您仍然会有JVM启动时间的开销,这使我想到了第二个解决方案:

If you really need to start your commands quickly, then I suggest that you modify your scala program to be able to accept an arbitrary number of actions from the command line (or by reading a file) launch the commands in parallel directly in your scala code. 如果您确实需要快速启动命令,那么建议您修改scala程序以使其能够从命令行(或通过读取文件)接受任意数量的操作(直接通过scala代码并行启动命令) 。 This is as easy as this: Seq(1, 2, 3, 4).par.foreach{println} The .par will create a ParSeq which is a sequence that can run in parallel. 这很简单: Seq(1, 2, 3, 4).par.foreach{println} .par将创建一个ParSeq,它是可以并行运行的序列。 You can even configure the degree of parallelism, but that is another question. 您甚至可以配置并行度,但这是另一个问题。

Third option: you could have a look at https://github.com/facebook/nailgun (or any similar project ) and use it to reduce your JVM starting time overhead. 第三种选择:您可以看看https://github.com/facebook/nailgun (或任何类似的项目 ),并使用它来减少JVM启动时间的开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM