[英]Do Spark Streaming and Spark Structured Streaming use same micro-batch engine?
Do Spark Streaming and Spark Structured Streaming use the same micro-batch scheduler engine? Spark Streaming 和 Spark Structured Streaming 使用相同的微批处理调度器引擎吗? Does Spark Structured Streaming have lower latency than Spark Streaming?
Spark Structured Streaming 的延迟是否比 Spark Streaming 低?
Do Spark Streaming and Spark Structured Streaming use same micro-batch scheduler engine
Spark Streaming 和 Spark Structured Streaming 是否使用相同的微批处理调度引擎
Certainly not.当然不是。 They're different internally, but share the same high-level concepts of a stream and a record.
它们在内部是不同的,但共享流和记录的相同高级概念。
While in Spark Structured Streaming you can get as close to how it was in Spark Streaming using DataStreamWriter.foreach
or DataStreamWriter.foreachBatch
methods.在 Spark Structured Streaming 中,您可以使用
DataStreamWriter.foreach
或DataStreamWriter.foreachBatch
方法接近它在 Spark Streaming 中的情况。
The main difference is how to describe a streaming pipeline.主要区别在于如何描述流式管道。 In Spark Structured Streaming you use Spark SQL's Dataset API while Spark Streaming bet on Spark Core's RDD API.
在 Spark Structured Streaming 中,您使用 Spark SQL 的 Dataset API,而 Spark Streaming 押注于 Spark Core 的 RDD API。 Both end up as a RDD-based computation, but Spark SQL uses higher-level abstractions (eg
Dataset
API).两者最终都是基于 RDD 的计算,但 Spark SQL 使用更高级别的抽象(例如
Dataset
API)。
Do they both use a "micro-batch scheduler engine"?他们都使用“微批处理调度引擎”吗? Yes, but Spark Structured Streaming is trying to leverage some data sources that can be queried continuously (and no micro-batching).
是的,但 Spark Structured Streaming 正在尝试利用一些可以连续查询的数据源(并且没有微批处理)。
does Spark Structured Streaming have lower latency than Spark Streaming?
Spark Structured Streaming 的延迟是否比 Spark Streaming 低?
That'd be hard to answer.这很难回答。 The creators of Spark Streaming decided to develop Spark Structured Streaming and hope to get better at query performance and expressiveness.
Spark Streaming 的创建者决定开发 Spark Structured Streaming,并希望在查询性能和表现力方面做得更好。 Spark Streaming is no longer recommended.
不再推荐 Spark Streaming。
Structered Streaming is mostly a higher-level abstraction that allows you to define your streaming logic then it uses Spark SQL engine for execution on the same micro-batch engine.结构化流主要是一种更高级别的抽象,它允许您定义流逻辑,然后它使用 Spark SQL 引擎在同一个微批处理引擎上执行。
By default Structured Streaming uses micro-batch engine, however if you are using Spark 2.3+, then you can have the continuous mode where you can get down to 1 millisecond
latency默认情况下,Structured Streaming 使用微批处理引擎,但是如果您使用的是 Spark 2.3+,那么您可以使用连续模式,将延迟降低到
1 millisecond
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.