简体   繁体   中英

write into kafka topic using spark and scala

I am reading data from Kafka topic and write back the data received into another Kafka topic.

Below is my code,

import org.apache.spark.sql.types._
                        import org.apache.spark.sql.functions._
                        import org.apache.kafka.clients.producer.{Kafka Producer, ProducerRecord}    
                        import org.apache.spark.sql.ForeachWriter
    //loading data from kafka
                        val data = spark.readStream.format("kafka")
                          .option("kafka.bootstrap.servers", "*******:9092")  
                          .option("subscribe", "PARAMTABLE")
                          .option("startingOffsets", "latest")
                          .load()  
    //Extracting value from Json
                        val schema = new StructType().add("PARAM_INSTANCE_ID",IntegerType).add("ENTITY_ID",IntegerType).add("PARAM_NAME",StringType).add("VALUE",StringType)
                        val df1 = data.selectExpr("CAST(value AS STRING)")
                        val dataDF = df1.select(from_json(col("value"), schema).as("data")).select("data.*")
    //Insert into another Kafka topic
                        val topic = "SparkParamValues"
                        val brokers = "********:9092"
                        val writer = new KafkaSink(topic, brokers)
                        val query = dataDF.writeStream
                                    .foreach(writer)
                                    .outputMode("update")
                                    .start().awaitTermination()
                        
                        

I am getting the below error,

                <Console>:47:error :not found: type KafkaSink
                            val writer = new KafkaSink(topic, brokers) 
                    
                   
                

I am very new to spark, Someone suggest how to resolve this or verify the above code whether it is correct. Thanks in advance.

In spark structured streaming, You can write to Kafka topic after reading from another topic using existing DataStreamWriter for Kafka or you can create your own sink by extending ForeachWriter class.

Without using custom sink:

You can use below code to write a dataframe to kafka. Assuming df as the dataframe generated by reading from kafka topic. Here dataframe should have atleast one column with name as value. If you have multiple columns you should merge them into one column and name it as value. If key column is not specified then key will be marked as null in destination topic.

  df.select("key", "value")
  .writeStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("topic", "<topicName>")
  .start()
  .awaitTermination()

Using custom sink:

If you want to implement your own Kafka sink you need create a class by extending ForeachWriter . You need override some methods and pass the object of this class to foreach() method.

   // By using Anonymous class to extend ForeachWriter
   df.writeStream.foreach(new ForeachWriter[Row] {
   // If you are writing Dataset[String] then new ForeachWriter[String]

     def open(partitionId: Long, version: Long): Boolean = {
       // open connection
     }

     def process(record: String) = {
       // write rows to connection
     }

     def close(errorOrNull: Throwable): Unit = {
       // close the connection
     }
   }).start()

You can check this databricks notebook for the implemented code (Scroll down and check the code under Kafka Sink heading). I think you are referring to this page only. To solve the issue you need to make sure that KafkaSink class is available to your spark code. You can bring both spark code file and class file in same package. If you are running on spark-shell paste the KafkaSink class before pasting spark code.

Read structured streaming kafka integration guide to explore more.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM