简体   繁体   中英

What is LocalTableScan in Spark Structure Streaming for?

does anyone know to what corresponds LocalTableScan in Spark Structured Streaming?

I'm trying to understand a strange behavior that I observed in my Spark structure streaming application that is running in local[*] mode.

I have 8 core on my machines. While the majority of my Batches have 8 partitions, every once in a while I get 16 or 32 or 56 and so on partitions/Tasks. I notice that it is always a multiple of 8. I have notice in opening the stage tab, that when it happens, it is because there is multiple LocalTableScan.

That is if I have 2 LocalTableScan then the mini-batch job, will have 16 task/partition and so on.

To give a bit of context because I am suspecting that it might come from it, I am using a MemoryStream.

val rows = MemoryStream[Map[String,String]]
val df = rows.toDF()
val rdf = df.mapPartitions{ it => {.....}}(RowEncoder.apply(StructType(List(StructField("blob", StringType, false)))))

I have a future that feeds my memory stream as such right after:

Future {
    blocking {
      for (i <- 1 to 100000) {
        rows.addData(maps)
        Thread.sleep(3000)
      }
    }
  }

and then my query:

rdf.writeStream.
    trigger(Trigger.ProcessingTime("1 seconds"))
    .format("console").outputMode("append")
    .queryName("SourceConvertor1").start().awaitTermination()

Please, any suggestions? Hints ?

It indicates in memory on the Driver. As your code shows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM