简体   繁体   English

Spark 结构化流式异步批处理阻塞

[英]Spark structured streaming asynchronous batch blocking

I'm using Apache Spark structured streaming for reading from Kafka.我正在使用 Apache Spark 结构化流来读取 Kafka。 Sometimes my micro batches get processed in a greater time than specified, due to heavy writes IO operations.有时,由于大量的写入 IO 操作,我的微批次在比指定的时间更长的时间内得到处理。 I was wondering if there's an option of starting the next batch before the first one has finished, but make the second batch blocked by the first?我想知道是否可以选择在第一批完成之前开始下一批,但让第二批被第一批挡住?

I mean that if the first one took 7 seconds and the batch is set for 5 seconds, then start the second batch on the fifth second.我的意思是,如果第一个需要 7 秒并且批次设置为 5 秒,那么在第五秒开始第二个批次。 But if the second batch finishes block it so it won't write before it's previous batch (because of the will to keep the correct messages order).但是如果第二批完成阻止它,那么它就不会在上一批之前写入(因为会保持正确的消息顺序)。

No. Next batch only starts if previous completed.不。下一批只在上一批完成后才开始。 I think you mean term interval.我想你的意思是任期间隔。 It would become a mess otherwise.否则会变得一团糟。

See https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers请参阅https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM