简体   繁体   English

Spark Streaming 将数据写入 Kafka 主题

[英]Spark Streaming Writing Data To Kafka Topic

I am trying to write a data frame to Kafka topic inside for each RDD.我正在尝试为每个 RDD 将数据框写入 Kafka 主题。 I am using below code:我正在使用以下代码:

 mesg.foreachRDD(rdd => { Dataframe.write.format("kafka")
    .option("kafka.bootstrap.servers","host")
    .option("subscribe","topic")
    .option("principal","Kerberos-principal")
    .option("keytab","kerberos-keytab")
    .save()
    })

enter code here

I am getting null pointer exception.我收到 null 指针异常。 Specifically I need to write a data frame to Kafka Topic.具体来说,我需要将数据框写入 Kafka 主题。 Can anyone help on this.任何人都可以帮忙吗? Note Dataframe here is obtained after converting rdd to dataframe and removing some fields from input json sent to Kafka Topic.注意这里的 Dataframe 是将 rdd 转换为 dataframe 并从发送到 Kafka Topic 的输入 json 中去除一些字段后得到的。

Exception in thread "main" java.lang.NullPointerException at java.util.regex.Matcher.getTextLength(Matcher.java:1283) at java.util.regex.Matcher.reset(Matcher.java309) at java.util.regex.Matcher.<init>(Matcher.java:229) at java.util.regex.Pattern.matcher(Pattern.java:1093)

The null pointer exception was due to config error which has been resolved. null 指针异常是由于已解决的配置错误。 For wrting dataframe to kafka topic from RDD please follow the below approach:要将 dataframe 从 RDD 写入 kafka 主题,请遵循以下方法:

import sparkSession.implicts._

val df = Original Dataframe.select(col("one column name"),to_json(struct($"*"))).toDF("key","value")

df.write.format("kafka").option("bootstrap-server-properties",value from config).option("topic",value from config).save()

Note: If you want to avoid hard coding of column name in select statement then follow this approach:注意:如果要避免在 select 语句中对列名进行硬编码,请遵循以下方法:

val df = Original Dataframe.select(to_json(struct($"*"))).as("value").selectExpr("CAST(value as STRING)") val df = Original Dataframe.select(to_json(struct($"*"))).as("value").selectExpr("CAST(value as STRING)")

df.write.format("kafka").option("bootstrap-server-properties",value from config).option("topic",value from config).save() df.write.format("kafka").option("bootstrap-server-properties",来自配置的值).option("topic",来自配置的值).save()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM