简体繁体 English

Spark Streaming Kafka Stream批处理执行

[英]Spark Streaming Kafka Stream batch execution

原文 2018-11-28 20:31:28 3 1 java/ apache-spark/ apache-kafka/ spark-streaming/ spark-streaming-kafka

I'm new in spark streaming and I have a general question relating to its usage. 我是火花流媒体的新手，我有一个与其使用有关的一般性问题。 I'm currently implementing an application which streams data from a Kafka topic. 我目前正在实现一个从Kafka主题流式传输数据的应用程序。

Is it a common scenario to use the application to run a batch only one time, for example, an end of the day, collecting all the data from the topic, do some aggregation and transformation and so on? 是否常常使用应用程序仅运行一次批处理，例如，一天结束，收集主题中的所有数据，进行一些聚合和转换等等？

That means after starting the app with spark-submit all this stuff will be performed in one batch and then the application would be shut down. 这意味着在使用spark-submit启动应用程序后，所有这些内容将在一个批处理中执行，然后应用程序将关闭。 Or is spark stream build for running endless and permanently stream data in continuous batches? 或是火花流构建为连续批次运行无限和永久流数据？

1 个解决方案

You can use kafka-stream api, and fix a window-time to perform aggregation and transformation over events in your topic only one batch at a time. 您可以使用kafka-stream api，并修复窗口时间，一次只对一个批处理事件中的事件执行聚合和转换。 for move information about windowing check this https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#windowing 有关窗口的移动信息，请访问https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#windowing

Kafka和TextSocket Stream中的Spark Streaming数据传播 - Spark Streaming data dissemination in Kafka and TextSocket Stream

在 Spark 流中，是否可以将批处理数据从 kafka 插入到 Hive？ - In Spark streaming, Is it possible to upsert batch data from kafka to Hive?

如何从Spark结构化流媒体获取Kafka输出中的批次ID - How to get batch ID in Kafka output from Spark Structured Streaming

如何使用Spark结构化流为Kafka流实现自定义反序列化器？ - How to implement custom deserializer for Kafka stream using Spark structured streaming?

如何使用直接流在Kafka Spark Streaming中指定使用者组 - how to specify consumer group in Kafka Spark Streaming using direct stream

Spark Streaming Kafka Consumer - Spark Streaming Kafka Consumer

Spark Kafka Streaming Issue - Spark Kafka Streaming Issue

Kafka Spark流式缓存 - Kafka Spark Streaming cache

Spark Streaming 中的 Kafka 消费者 - Kafka consumer in Spark Streaming

限制Kafka批量大小时如何在每批中进行火花流提交？ - How to make spark streaming commit in each batch when limiting Kafka batch size?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka和TextSocket Stream中的Spark Streaming数据传播 - Spark Streaming data dissemination in Kafka and TextSocket Stream 在 Spark 流中，是否可以将批处理数据从 kafka 插入到 Hive？ - In Spark streaming, Is it possible to upsert batch data from kafka to Hive? 如何从Spark结构化流媒体获取Kafka输出中的批次ID - How to get batch ID in Kafka output from Spark Structured Streaming 如何使用Spark结构化流为Kafka流实现自定义反序列化器？ - How to implement custom deserializer for Kafka stream using Spark structured streaming? 如何使用直接流在Kafka Spark Streaming中指定使用者组 - how to specify consumer group in Kafka Spark Streaming using direct stream Spark Streaming Kafka Consumer - Spark Streaming Kafka Consumer Spark Kafka Streaming Issue - Spark Kafka Streaming Issue Kafka Spark流式缓存 - Kafka Spark Streaming cache Spark Streaming 中的 Kafka 消费者 - Kafka consumer in Spark Streaming 限制Kafka批量大小时如何在每批中进行火花流提交？ - How to make spark streaming commit in each batch when limiting Kafka batch size?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM