简体   繁体   English

将 sparkSession 作为函数参数传递 spark-scala

[英]Passing sparkSession as function parameters spark-scala

I'm in the process of generating tables using spark-scala and I am concerned about efficiency.我正在使用 spark-scala 生成表格,我很关心效率。

Would passing sparkSession make my program slower?通过 sparkSession 会使我的程序变慢吗? Is it any slower than SparkSession.getOrCreate ?它比 SparkSession.getOrCreate 慢吗?

I am using yarn as master.我使用纱线作为主人。

Thanks in advance.提前致谢。

You can create Spark session once and pass around without losing any performance.您可以创建 Spark 会话一次并在不损失任何性能的情况下传递。 However it is little inconvenient to modify method signature to pass in a session object.但是,修改方法签名以传入会话对象并不方便。 You can avoid that by simply calling getOrCreate in the functions to obtain the same global session without passing it.您可以通过简单地在函数中调用getOrCreate来获取相同的全局会话而不传递它来避免这种情况。 When getOrCreate is called it sets the current session as default SparkSession.setDefaultSession ad gives that back to you for other getOrCreat callsgetOrCreate被调用时,它会将当前会话设置为默认SparkSession.setDefaultSession广告将其返回给其他getOrCreat调用

    val spark : SparkSession = SparkSession.builder
      .appName("test")
      .master("local[2]")
      .getOrCreate()

    //pass in function
    function1(pass)
    
    //obtain without passing
    
    def function2(){
    val s = SparkSession.builder.getOrCreate()
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM