简体   繁体   English

Spark 流式传输与结构化流式传输

[英]Spark Streaming vs Structured Streaming

The last months I've been using quite a lot Structured Streaming for implementing Stream Jobs (after using Kafka a lot).过去几个月我一直在使用大量结构化流来实现 Stream 作业(在大量使用 Kafka 之后)。 After reading the book Stream Processing with Apache Spark i was having this question: Is there any point or use cases where i would use Spark Streaming instead of Structured Streaming?在阅读 Stream Processing with Apache Spark 一书后,我遇到了这个问题:是否有任何点或用例可以使用 Spark Streaming 而不是 Structured Streaming? Should i invest some time getting into it or since im already using Spark Structured Streaming i should stick with it and there is no benefit on the previous API.我应该花一些时间来研究它还是因为我已经在使用 Spark Structured Streaming 我应该坚持使用它,并且以前的 API 没有任何好处。 Would appreciate any opinion/insight将不胜感激任何意见/见解

Hi Sharing my personal experience.你好分享我的个人经历。

Structured streaming is the future for spark based streaming implementation.结构化流是基于 Spark 的流实现的未来。 It provides higher level of abstraction and other great features.它提供了更高级别的抽象和其他强大的功能。 However there are few restrictions.但是限制很少。

i have had to switch to spark streaming on few occasions due to the flexibility offered by it.由于它提供的灵活性,我不得不在少数情况下切换到火花流。 One recent example is, we had to perform Joins with static reference data, however Outer joins are not supported in Structured streaming.最近的一个例子是,我们必须使用 static 参考数据执行联接,但是结构化流不支持外部联接。 This can be accomplished with Spark streaming.这可以通过 Spark 流式传输来完成。

With the newer spark version 2.4, Structured streaming is much improved with support for foreachBatch sink which gives similar flexibility offered by spark streaming.使用较新的 spark 版本 2.4,结构化流通过支持 foreachBatch 接收器得到了很大改进,这提供了 spark 流提供的类似灵活性。

My personal thought is having the knowledge of spark streaming is helpful and you might have to use it depending on your use case.我个人的想法是了解火花流的知识是有帮助的,您可能必须根据您的用例来使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM