简体   繁体   English

Spark Streaming 和 Spark Structured Streaming 使用相同的微批处理引擎吗?

[英]Do Spark Streaming and Spark Structured Streaming use same micro-batch engine?

Do Spark Streaming and Spark Structured Streaming use the same micro-batch scheduler engine? Spark Streaming 和 Spark Structured Streaming 使用相同的微批处理调度器引擎吗? Does Spark Structured Streaming have lower latency than Spark Streaming? Spark Structured Streaming 的延迟是否比 Spark Streaming 低?

Do Spark Streaming and Spark Structured Streaming use same micro-batch scheduler engine Spark Streaming 和 Spark Structured Streaming 是否使用相同的微批处理调度引擎

Certainly not.当然不是。 They're different internally, but share the same high-level concepts of a stream and a record.它们在内部是不同的,但共享流和记录的相同高级概念。

While in Spark Structured Streaming you can get as close to how it was in Spark Streaming using DataStreamWriter.foreach or DataStreamWriter.foreachBatch methods.在 Spark Structured Streaming 中,您可以使用DataStreamWriter.foreachDataStreamWriter.foreachBatch方法接近它在 Spark Streaming 中的情况。

The main difference is how to describe a streaming pipeline.主要区别在于如何描述流式管道。 In Spark Structured Streaming you use Spark SQL's Dataset API while Spark Streaming bet on Spark Core's RDD API.在 Spark Structured Streaming 中,您使用 Spark SQL 的 Dataset API,而 Spark Streaming 押注于 Spark Core 的 RDD API。 Both end up as a RDD-based computation, but Spark SQL uses higher-level abstractions (eg Dataset API).两者最终都是基于 RDD 的计算,但 Spark SQL 使用更高级别的抽象(例如Dataset API)。

Do they both use a "micro-batch scheduler engine"?他们都使用“微批处理调度引擎”吗? Yes, but Spark Structured Streaming is trying to leverage some data sources that can be queried continuously (and no micro-batching).是的,但 Spark Structured Streaming 正在尝试利用一些可以连续查询的数据源(并且没有微批处理)。

does Spark Structured Streaming have lower latency than Spark Streaming? Spark Structured Streaming 的延迟是否比 Spark Streaming 低?

That'd be hard to answer.这很难回答。 The creators of Spark Streaming decided to develop Spark Structured Streaming and hope to get better at query performance and expressiveness. Spark Streaming 的创建者决定开发 Spark Structured Streaming,并希望在查询性能和表现力方面做得更好。 Spark Streaming is no longer recommended.不再推荐 Spark Streaming。

Structered Streaming is mostly a higher-level abstraction that allows you to define your streaming logic then it uses Spark SQL engine for execution on the same micro-batch engine.结构化流主要是一种更高级别的抽象,它允许您定义流逻辑,然后它使用 Spark SQL 引擎在同一个微批处理引擎上执行。

By default Structured Streaming uses micro-batch engine, however if you are using Spark 2.3+, then you can have the continuous mode where you can get down to 1 millisecond latency默认情况下,Structured Streaming 使用微批处理引擎,但是如果您使用的是 Spark 2.3+,那么您可以使用连续模式,将延迟降低到1 millisecond

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在一个微批量的 Spark 结构化流中设置批量大小 - How to set batch size in one micro-batch of spark structured streaming 在Spark Streaming中的微批量结束之前执行操作 - Perform actions before end of the micro-batch in Spark Streaming Spark流中的每个微批处理后调用Java函数 - Invoking a java function after each micro-batch in Spark streaming 控制微型批次的结构化火花流 - Control micro batch of Structured Spark Streaming 如何从 Spark 结构化流作业中的每个微批次中的相同起始偏移量读取? - How do I read from same starting offset in each micro batch in spark structured streaming job? Spark结构化流式批处理 - Spark Structured Streaming Batch spark结构化流和批处理的相同接收器? - Same sink for spark structured streaming and batch? 在Spark流微批处理结束时是否持久化了内存持久的RDD? - Are memory-persisted RDD unpersisted at the end of a Spark streaming micro-batch? 在当前微批处理期间,Spark 流式接收器是否继续为每个块间隔提取数据 - Does Spark streaming receivers continue pulling data for every block interval during the current micro-batch 为什么围绕 Spark Streaming 微批处理(使用 kafka 作为源)受到如此多的批评? - Why so much criticism around Spark Streaming micro-batch (when using kafka as source)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM