简体   繁体   English

Spark:每个Spark RDD分区的数据库连接,并执行mapPartition

[英]Spark : DB connection per Spark RDD partition and do mapPartition

I want to do a mapPartitions on my spark rdd, 我想在我的spark rdd上做一个mapPartitions,

    val newRd = myRdd.mapPartitions(
      partition => {

        val connection = new DbConnection /*creates a db connection per partition*/

        val newPartition = partition.map(
           record => {
             readMatchingFromDB(record, connection)
         })
        connection.close()
        newPartition
      })

But, this gives me a connection already closed exception, as expected because before the control reaches the .map() my connection is closed. 但是,这给了我一个连接已经关闭的异常,正如预期的那样,因为在控件到达.map()我的connection已经关闭。 I want to create a connection per RDD partition, and close it properly. 我想为每个RDD分区创建一个连接,并正确关闭它。 How can I achieve this? 我怎样才能做到这一点?

Thanks! 谢谢!

As mentioned in the discussion here - the issue stems from the laziness of map operation on the iterator partition . 正如在讨论中提到这里 -这个问题从地图操作上的迭代器懒惰茎partition This laziness means that for each partition, a connection is created and closed, and only later (when RDD is acted upon), readMatchingFromDB is called. 这种懒惰意味着对于每个分区,创建和关闭连接,并且仅在稍后(当执行RDD时), readMatchingFromDB

To resolve this, you should force an eager traversal of the iterator before closing the connection, eg by converting it into a list (and then back): 要解决此问题,您应该在关闭连接之前强制执行迭代器的热切遍历,例如将其转换为列表(然后返回):

val newRd = myRdd.mapPartitions(partition => {
  val connection = new DbConnection /*creates a db connection per partition*/

  val newPartition = partition.map(record => {
    readMatchingFromDB(record, connection)
  }).toList // consumes the iterator, thus calls readMatchingFromDB 

  connection.close()
  newPartition.iterator // create a new iterator
})
rdd.foreachPartitionAsync(iterator->{

// this object will be cached inside each executor JVM. For the first time, the //connection will be created and hence forward, it will be reused. 
// Very useful for streaming apps
DBConn conn=DBConn.getConnection()
while(iterator.hasNext()) {
  conn.read();
}

});

public class DBConn{
private static dbObj=null;

//Create a singleton method that returns only one instance of this object
}

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM