[英]Spark : DB connection per Spark RDD partition and do mapPartition
I want to do a mapPartitions on my spark rdd, 我想在我的spark rdd上做一个mapPartitions,
val newRd = myRdd.mapPartitions(
partition => {
val connection = new DbConnection /*creates a db connection per partition*/
val newPartition = partition.map(
record => {
readMatchingFromDB(record, connection)
})
connection.close()
newPartition
})
But, this gives me a connection already closed exception, as expected because before the control reaches the .map()
my connection
is closed. 但是,这给了我一个连接已经关闭的异常,正如预期的那样,因为在控件到达.map()
我的connection
已经关闭。 I want to create a connection per RDD partition, and close it properly. 我想为每个RDD分区创建一个连接,并正确关闭它。 How can I achieve this? 我怎样才能做到这一点?
Thanks! 谢谢!
As mentioned in the discussion here - the issue stems from the laziness of map operation on the iterator partition
. 正如在讨论中提到这里 -这个问题从地图操作上的迭代器懒惰茎partition
。 This laziness means that for each partition, a connection is created and closed, and only later (when RDD is acted upon), readMatchingFromDB
is called. 这种懒惰意味着对于每个分区,创建和关闭连接,并且仅在稍后(当执行RDD时), readMatchingFromDB
。
To resolve this, you should force an eager traversal of the iterator before closing the connection, eg by converting it into a list (and then back): 要解决此问题,您应该在关闭连接之前强制执行迭代器的热切遍历,例如将其转换为列表(然后返回):
val newRd = myRdd.mapPartitions(partition => {
val connection = new DbConnection /*creates a db connection per partition*/
val newPartition = partition.map(record => {
readMatchingFromDB(record, connection)
}).toList // consumes the iterator, thus calls readMatchingFromDB
connection.close()
newPartition.iterator // create a new iterator
})
rdd.foreachPartitionAsync(iterator->{
// this object will be cached inside each executor JVM. For the first time, the //connection will be created and hence forward, it will be reused.
// Very useful for streaming apps
DBConn conn=DBConn.getConnection()
while(iterator.hasNext()) {
conn.read();
}
});
public class DBConn{
private static dbObj=null;
//Create a singleton method that returns only one instance of this object
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.