简体   繁体   English

kafka 与 Apache Spark 的集成

[英]kafka integration with Apache spark

I'm learning apache spark integration with kafka so that my code could run automatically whenever new message arrives in Kafka's topic.我正在学习 apache spark 与 kafka 的集成,以便我的代码可以在新消息到达 Kafka 的主题时自动运行。

I've read official documentation as well我也阅读了官方文档

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

But am still confused how its working.但我仍然困惑它是如何工作的。 I've my word count written program in java spark and other side Kafka is running.我在 java spark 中编写了字数统计程序,而另一边 Kafka 正在运行。

Is structured streaming a bridge between kafka and spark java code ?结构化流是 kafka 和 spark java 代码之间的桥梁吗? Does it keep listening to kafka and whenever message arrives, it pulls from kafka and pass it over to spark java code ..?它是否一直在收听 kafka 并且每当消息到达时,它都会从 kafka 中提取并将其传递给 spark java 代码..? Is it correct ..?这是正确的吗 ..?

If not, can anyone please share with me, how it works in simple words ..?如果没有,任何人都可以与我分享,用简单的话来说它是如何工作的..? Any other reference will be appreciated.任何其他参考将不胜感激。

How should I integrate my java spark code to Kafka so that it triggers automatically whenever new message arrives in kafka..?我应该如何将我的 java spark 代码集成到 Kafka 以便它在新消息到达 kafka 时自动触发..?

Thanks谢谢

Spark delegates to the basic Kafka consumer APIs, which poll messages in batches as they arrive to the topic. Spark 委托给基本的 Kafka 消费者 API,这些 API 在消息到达主题时分批轮询消息。

Structured Streaming and regular Spark Streaming work the same in this regard. Structured Streaming 和常规 Spark Streaming 在这方面的工作方式相同。

You may want to start with the Kafka basic consumer or Kafka Streams if you are interested in learning how Kafka record delivery works, as Spark might be overkill, depending on the task如果您有兴趣了解 Kafka 记录交付的工作原理,您可能想从 Kafka 基本消费者或 Kafka Streams 开始,因为根据任务,Spark 可能有点矫枉过正

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM