[英]How to use foreachPartition in Spark 2.2 to avoid Task Serialization error
I have the following working code that uses Structured Streaming (Spark 2.2) in order to read data from Kafka (0.10). 我有以下工作代码使用结构化流(Spark 2.2)来从Kafka(0.10)读取数据。 The only issue that I cannot solve is related to
Task serialization problem
when using kafkaProducer
inside ForeachWriter
. 在
ForeachWriter
使用kafkaProducer
时,唯一无法解决的Task serialization problem
与Task serialization problem
ForeachWriter
。 In my old version of this code developed for Spark 1.6 I was using foreachPartition
and I was defining kafkaProducer
for each partition to avoid Task Serialization problem. 在我为Spark 1.6开发的旧代码中,我使用的是
foreachPartition
,我为每个分区定义了kafkaProducer
,以避免任务序列化问题。 How can I do it in Spark 2.2? 我怎么能在Spark 2.2中做到这一点?
val df: Dataset[String] = spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "test")
.option("startingOffsets", "latest")
.option("failOnDataLoss", "true")
.load()
.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)").as[(String, String)]
.map(_._2)
var mySet = spark.sparkContext.broadcast(Map(
"metadataBrokerList"->metadataBrokerList,
"outputKafkaTopic"->outputKafkaTopic,
"batchSize"->batchSize,
"lingerMS"->lingerMS))
val kafkaProducer = Utils.createProducer(mySet.value("metadataBrokerList"),
mySet.value("batchSize"),
mySet.value("lingerMS"))
val writer = new ForeachWriter[String] {
override def process(row: String): Unit = {
// val result = ...
val record = new ProducerRecord[String, String](mySet.value("outputKafkaTopic"), "1", result);
kafkaProducer.send(record)
}
override def close(errorOrNull: Throwable): Unit = {}
override def open(partitionId: Long, version: Long): Boolean = {
true
}
}
val query = df
.writeStream
.foreach(writer)
.start
query.awaitTermination()
spark.stop()
Write implementation of ForeachWriter and than use it. 编写ForeachWriter的实现并使用它。 (Avoid anonymous classes with not serializable objects - in your case its ProducerRecord)
(避免使用不具有可序列化对象的匿名类 - 在您的情况下是其ProducerRecord)
Example: val writer = new YourForeachWriter[String]
示例:
val writer = new YourForeachWriter[String]
Also here is helpful article about Spark Serialization problems: https://www.cakesolutions.net/teamblogs/demystifying-spark-serialisation-error 这里有一篇关于Spark序列化问题的有用文章: https : //www.cakesolutions.net/teamblogs/demystifying-spark-serialisation-error
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.