does anyone know to what corresponds LocalTableScan in Spark Structured Streaming?
I'm trying to understand a strange behavior that I observed in my Spark structure streaming application that is running in local[*] mode.
I have 8 core on my machines. While the majority of my Batches have 8 partitions, every once in a while I get 16 or 32 or 56 and so on partitions/Tasks. I notice that it is always a multiple of 8. I have notice in opening the stage tab, that when it happens, it is because there is multiple LocalTableScan.
That is if I have 2 LocalTableScan then the mini-batch job, will have 16 task/partition and so on.
To give a bit of context because I am suspecting that it might come from it, I am using a MemoryStream.
val rows = MemoryStream[Map[String,String]]
val df = rows.toDF()
val rdf = df.mapPartitions{ it => {.....}}(RowEncoder.apply(StructType(List(StructField("blob", StringType, false)))))
I have a future that feeds my memory stream as such right after:
Future {
blocking {
for (i <- 1 to 100000) {
rows.addData(maps)
Thread.sleep(3000)
}
}
}
and then my query:
rdf.writeStream.
trigger(Trigger.ProcessingTime("1 seconds"))
.format("console").outputMode("append")
.queryName("SourceConvertor1").start().awaitTermination()
Please, any suggestions? Hints ?
It indicates in memory on the Driver. As your code shows.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.