简体   繁体   中英

How to read json data using scala from kafka topic in apache spark

I am new spark, Could you please let me know how to read json data using scala from kafka topic in apache spark.

Thanks.

The simplest method would be to make use of the DataFrame abstraction shipped with Spark.

val sqlContext = new SQLContext(sc)
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
                  ssc, kafkaParams, Set("myTopicName"))

stream.foreachRDD(
  rdd => {
     val dataFrame = sqlContext.read.json(rdd.map(_._2)) //converts json to DF
     //do your operations on this DF. You won't even require a model class.
        })

I use Play Framework's library for Json . You can add it to your project as a standalone module . Usage is as follows:

import play.api.libs.json._
import org.apache.spark.streaming.kafka.KafkaUtils

case class MyClass(field1: String,
                   field2: Int)

implicit val myClassFormat = Json.format[MyClass]

val kafkaParams = Map[String, String](...here are your params...)    
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
  ssc, kafkaParams, Set("myTopicName"))
  .map(m => Json.parse(m._2).as[MyClass])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM