简体   繁体   English

如何在Spark Executor中关闭内核之间的共享单例连接

[英]How to close shared singleton connection between cores in Spark Executor

I am using shared connection between all cores of single executor of Spark. 我在Spark单个执行程序的所有核心之间使用共享连接。 Basically I have created singleton connection object in order to share between cores of single executor so that it will be shared between cores and there will be only 1 connection per executor. 基本上,我创建了单例连接对象,以便在单个执行程序的内核之间共享,以便在内核之间共享,并且每个执行程序只有1个连接。

object SingletonConnection {

private var connection: Connection = null

def getConnection(url: String, username: String, password: String): Connection = synchronized {
if (connection == null) {
  connection = DriverManager.getConnection(url, username, password)
}
connection
}
}

Spark executor code: Spark执行程序代码:

dataFrame.foreachPartition { batch =>
  if (batch.nonEmpty) {
    lazy val dbConnection = SingletonConnection
    val dbc = dbConnection.getConnection(url, user, password)

    // do some operatoins


          st.addBatch()
        }
        st.executeBatch()
      }
    }
    catch {
      case exec: BatchUpdateException =>
        var ex: SQLException = exec
        while (ex != null) {
          ex.printStackTrace()
          ex = ex.getNextException
        }
        throw exec
    }

  }
}

Problem here is , I cannot close the connection. 问题是,我无法关闭连接。 Since I will not know when particular core finishes its execution. 因为我不知道特定内核何时完成执行。 If i close connection in finally, as soon as one core finishes its task it closes the connection and that causes all other cores to stop since shared connection is closed. 如果我最后关闭连接,则一旦一个核心完成其任务,它就会关闭连接,由于共享连接已关闭,这将导致所有其他核心停止。

Since I am not closing the connection here, the connection remains open even after the task is finished. 由于我没有在此处关闭连接,因此即使任务完成后,该连接仍保持打开状态。 How can I make this process work so that I should be able to close the connection ONLY AFTER ALL CORES HAVE FINISHED THEIR TASKS. 如何使此过程正常进行,以便只有在完成所有任务后才能关闭连接。

I implemented it using Java, so I can just give you some clue. 我使用Java实现了它,因此我可以给您一些提示。

In SingletonConnection class I created a thread-safe accumulator. 在SingletonConnection类中,我创建了一个线程安全的累加器。 Each time the connection is opened, the accumulator is incremented by one. 每次打开连接时,累加器都会加一。 And each time befor closing the connection, the accumulator is decremented by one and check if the accumulator is equals to zero. 并且每次关闭连接时,累加器都会减一,并检查累加器是否等于零。 When the accumulator equals to zero, then you can close the connection. 当累加器等于零时,则可以关闭连接。

This won't close connection when other runnning threads are still using the connection. 当其他运行线程仍在使用连接时,这不会关闭连接。 But this will let you create more connections than you thought(the number of partitions). 但是,这将使您创建的连接数量超出您的想象(分区数)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 spark.task.cpus和--executor-cores之间有什么区别 - What is the difference between spark.task.cpus and --executor-cores spark-submit:“ --master local [n]”和“ --master local --executor-cores m”之间的区别 - spark-submit: Difference between “ --master local[n]” and “--master local --executor-cores m” 如何使用所有 4 个 CPU 内核使用 Executor Framework 执行操作? - How to use all 4 CPU cores for performing operation using Executor Framework? 使用spark-submit, - length-executor-cores选项的行为是什么? - Using spark-submit, what is the behavior of the --total-executor-cores option? 使用Executor接口时如何关闭套接字? - How to close a socket when using the Executor interface? Spark:理解分区 - 核心 - Spark: understanding partitioning - cores 获取任务节点上执行程序的核心数的方法? - Method to get number of cores for a executor on a task node? 一个执行器中有多少并发任务,Spark 如何处理一个执行器中的任务之间的多线程? - How many concurrent tasks in one executor and how Spark handles multithreading among tasks in one executor? 同一执行程序中的Spark任务如何共享变量(带有SimpleDateFormat的NumberFormatException)? - How Spark tasks in the same executor share variables (NumberFormatException with SimpleDateFormat)? 内核之间的缓存同步 - Cache synchronized between cores
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM