简体   繁体   English

Spark Streaming Kafka Stream批处理执行

[英]Spark Streaming Kafka Stream batch execution

I'm new in spark streaming and I have a general question relating to its usage. 我是火花流媒体的新手,我有一个与其使用有关的一般性问题。 I'm currently implementing an application which streams data from a Kafka topic. 我目前正在实现一个从Kafka主题流式传输数据的应用程序。

Is it a common scenario to use the application to run a batch only one time, for example, an end of the day, collecting all the data from the topic, do some aggregation and transformation and so on? 是否常常使用应用程序仅运行一次批处理,例如,一天结束,收集主题中的所有数据,进行一些聚合和转换等等?

That means after starting the app with spark-submit all this stuff will be performed in one batch and then the application would be shut down. 这意味着在使用spark-submit启动应用程序后,所有这些内容将在一个批处理中执行,然后应用程序将关闭。 Or is spark stream build for running endless and permanently stream data in continuous batches? 或是火花流构建为连续批次运行无限和永久流数据?

You can use kafka-stream api, and fix a window-time to perform aggregation and transformation over events in your topic only one batch at a time. 您可以使用kafka-stream api,并修复窗口时间,一次只对一个批处理事件中的事件执行聚合和转换。 for move information about windowing check this https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#windowing 有关窗口的移动信息,请访问https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#windowing

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM