How to read messages from Kafka using Core spark for batch processing

Question

Can I read messages from Kafka without Spark Streaming? I mean only with Spark Core library for the batch processing purpose. If yes can you please show some examples how to do it. I am using HDP 2.4, Kafka 0.9 and Spark 1.6.

Answer 1

There is a class called KafkaUtils in spark streaming kafka api.

https://github.com/apache/spark/blob/master/external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala

From this class you can use a method createRDD , which is basically expecting offsets and it is useful only for non-streaming applications.

Dependency jar:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka_2.10</artifactId>
    <version>1.6.0</version>
</dependency>

Also, check Kafka Connect, for example you want to read Kafka topic data and populate the data in HDFS, its very simple using Kafka Connect.

http://docs.confluent.io/3.0.0/connect/ http://www.confluent.io/product/connectors/

How to read messages from Kafka using Core spark for batch processing

Question

1 answers

solution1
1 2016-10-13 17:10:11

How to read messages from Kafka using Core spark for batch processing

Question

1 answers

solution1 1 2016-10-13 17:10:11

solution1
1 2016-10-13 17:10:11