简体   繁体   English

从bash启动spark作业,从文本文件获取参数

[英]launch spark job from bash taking parameters from a text file

I am running a standalone spark instance that I launch with: 我正在运行一个独立的Spark实例,并通过以下实例启动:

/usr/local/spark-1.6.0/bin/spark-submit --class "run.Main" --conf spark.driver.userClassPathFirst=true --driver-memory 45G --jars $(echo /var/myapp/lib/*.jar | tr ' ' ',') mycoolapp.jar "local[6]" "parA" "parB" "parC" "parD"

what I do manually is to launch it for a specific "parA" value. 我手动执行的操作是针对特定的“ parA”值启动它。

Then, once is finished, I relaunch it with a new value for "parA". 然后,一旦完成,我将使用“ parA”的新值重新启动它。 I have all the possible "parA" values listed in a .txt files, and I am wondering if it's possible to write a bash script that does this for me, ie launching the script and automatically picking the next "parA" value from the text file. 我在.txt文件中列出了所有可能的“ parA”值,我想知道是否有可能编写一个bash脚本来为我完成此操作,即启动脚本并自动从文本中选择下一个“ parA”值文件。

Of course, I need that it waits to have finished a Spark job before launching the next, since I am using Spark on a single machine and each single job eats almost all the RAM on the machine... 当然,我需要在启动下一个作业之前等待完成Spark作业,因为我在单台计算机上使用Spark,并且每个作业几乎消耗了该计算机上的所有RAM ...

any guidance on that is more than welcome. 对此的任何指导都值得欢迎。

Something like this. 这样的事情。 You just iterate over an array of arguments. 您只需遍历一组参数即可。 And you no need to worry about how to wait until the end of the job, because submit operation is synchronous. 您不必担心如何等到作业结束,因为提交操作是同步的。

#!/bin/bash

declare -a parAs=('parA0' 'parA1' 'parA2')

for parA in "${parAs[@]}"; do
    echoString=$(eval echo /var/myapp/lib/*.jar | tr ' ' ',')
    ./bin/submit ....--jars $echoString...... $parA .....
done

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM