简体   繁体   English

当我们将Spark从Standalone切换到Yarn-Client时,需要更改什么?

[英]What needs to be changed when we switch Spark from Standalone to Yarn-Client?

Currently we have a program which is a web service, receiving SQL queries and use SQLContext to respond. 当前,我们有一个程序是一个Web服务,它接收SQL查询并使用SQLContext进行响应。 The program is now in standalone mode, we set spark.master to a specific URL. 该程序现在处于独立模式,我们将spark.master设置为特定的URL。 The structure is something like below: 结构如下所示:

object SomeApp extends App
{
    val conf = new SparkConf().setMaster("spark://10.21.173.181:7077")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)

    while(true)
    {
        val query = Listen_to_query()
        val response = sqlContext.sql(query)
        send(response)
    }
}

Now we are going to shift the system to Spark on Yarn, and it seems that we should use submit to submit jobs to yarn. 现在,我们将系统转换为“ Spark on Yarn”,看来我们应该使用submit将作业提交到yarn。 It would be strange to deploy such a "service" on yarn which won't stop like ordinary "Jobs". 在纱线上部署这样的“服务”不会像普通的“作业”那样停止会很奇怪。 But we don't know how to separate "Jobs" from our program. 但是我们不知道如何从程序中分离“工作”。

Do you have any suggestions? 你有什么建议吗? Thank you! 谢谢!

So if you just want to submit your jobs to yarn you can just change the master param. 因此,如果您只想将工作提交给yarn,就可以更改主参数。 However it sounds like you are looking for a long running shared Spark Context and there are a few options for something like this. 但是,听起来您正在寻找长期运行的共享Spark Context,并且有一些类似的选项。 There is https://github.com/spark-jobserver/spark-jobserver and https://github.com/ibm-et/spark-kernel . https://github.com/spark-jobserver/spark-jobserverhttps://github.com/ibm-et/spark-kernel

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用scala将eclipse中的spark作业提交给yarn-client - Submitting spark job from eclipse to yarn-client with scala 为报告平台选择哪种纱线群集或纱线客户端? - What to choose yarn-cluster or yarn-client for a reporting platform? 带有'yarn-client'的Spark-shell尝试从错误的位置加载配置 - Spark-shell with 'yarn-client' tries to load config from wrong location 无法在yarn-client模式下运行pyspark(尽管pyspark独立版本正在运行) - Unable to run pyspark in yarn-client mode (pyspark standalone is working though) Spark提交以master作为yarn-client(windows)给出错误“找不到或加载主类” - Spark submit with master as yarn-client (windows) gives Error “Could not find or load main class” 在Ubuntu 14.04上的Yarn-Client模式下在Spark上的Zeppelin中加载外部依赖项 - Loading external dependencies in Zeppelin on Spark in Yarn-Client mode on Ubuntu 14.04 在HDP(2.2)植物形式上使用Yarn-Client上的PySpark将Hbase表读取到Spark(1.2.0.2.2.0.0-82)RDD时,出现异常“未读块数据” - Got exception “unread block data” when reading Hbase table to Spark(1.2.0.2.2.0.0-82) RDD using PySpark on Yarn-Client on HDP (2.2) plantform 无法在Scala-IDE调用的“ yarn-client”模式下初始化SparkContext - SparkContext cannot be initialized in 'yarn-client' mode called from Scala-IDE 为什么使用“java.lang.ClassNotFoundException:org.apache.hadoop.fs.FSDataInputStream”启动带有yarn-client的spark-shell失败? - Why does launching spark-shell with yarn-client fail with “java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream”? 使用Yarn-client在Google Cloud上的Hadoop中运行JAR - Running JAR in Hadoop on Google Cloud using Yarn-client
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM