[英]How to read messages from Kafka using Core spark for batch processing
Can I read messages from Kafka without Spark Streaming?我可以在没有 Spark Streaming 的情况下从 Kafka 读取消息吗? I mean only with Spark Core library for the batch processing purpose.我的意思是仅将 Spark Core 库用于批处理目的。 If yes can you please show some examples how to do it.如果是的话,你可以展示一些例子如何做到这一点。 I am using HDP 2.4, Kafka 0.9 and Spark 1.6.我使用的是 HDP 2.4、Kafka 0.9 和 Spark 1.6。
There is a class called KafkaUtils
in spark streaming kafka api. KafkaUtils
Streaming kafka api中有一个叫做KafkaUtils
的类。
https://github.com/apache/spark/blob/master/external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala https://github.com/apache/spark/blob/master/external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
From this class you can use a method createRDD
, which is basically expecting offsets and it is useful only for non-streaming applications.在这个类中,您可以使用createRDD
方法,该方法基本上期望偏移,并且仅对非流应用程序有用。
Dependency jar:依赖jar:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.0</version>
</dependency>
Also, check Kafka Connect, for example you want to read Kafka topic data and populate the data in HDFS, its very simple using Kafka Connect.另外,检查Kafka Connect,例如您想读取Kafka主题数据并将数据填充到HDFS中,使用Kafka Connect非常简单。
http://docs.confluent.io/3.0.0/connect/ http://www.confluent.io/product/connectors/ http://docs.confluent.io/3.0.0/connect/ http://www.confluent.io/product/connectors/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.