简体   繁体   中英

How to run a spark program on cluster from within a separate java program?

I have a java program which runs separate small spark programs so How could I make my java program to run small spark modules/programs on cluster?

for example: I have a program name executor(java program).

and some spark programs --add two numbers --subtract two numbers

So how could I run those spark programs on Cluster from my java program ie executor in this case.

Thanks!!!

检查Spark Job Server Project,让您创建一个共享的contex并从rest接口执行作业: https : //github.com/spark-jobserver/spark-jobserver希望这会有用

Possible solution could be:

  1. Write a bash script and sequentially execute Spark program.

  2. Write all the operations in a single program, and call each operation in a single program and print respective results.

  3. Write a single program but use principle of parallel programming. It means, execute such operation in parallel. Well, it depend on what type of data you have and what you would like to achieve, hard to comment.

You could use SparkLauncher this way:

import org.apache.spark.launcher.SparkLauncher
import scala.collection.JavaConverters._


val env = Map(
  "HADOOP_CONF_DIR" -> hadoopConfDir,
  "YARN_CONF_DIR" -> yarnConfDir
)

println(env.asJava)
val process = new SparkLauncher(env.asJava)
  .setSparkHome(sparkHome)
  .setAppResource(jarPath)
  .setAppName(appName)
  .setMainClass(className) //Main class in your uber spark JAR 
  .setMaster(master)
  //.setConf("spark.driver.memory", "2g") example additional conf prop
  .setVerbose(true)
  .launch()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM