简体   繁体   English

PySpark:将 Spark Dataframe 写入 Kafka 主题

[英]PySpark : Write Spark Dataframe to Kafka Topic

Am trying to load dataframe to Kafka Topic.我正在尝试将 dataframe 加载到 Kafka 主题。 Am getting error on selecting the key and value.选择键和值时出错。 Any suggestion would be helpful.任何建议都会有所帮助。

Below is my code,下面是我的代码,

data = spark.sql('select * from job')

kafka = data.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")\
    .writeStream.outputMode(outputMode='Append').format('kafka')\
    .option("kafka.bootstrap.servers", "localhost:9092")\
    .option("topic", "Jim_Topic")\
    .option("checkpointLocation", "C:/Hadoop/Data/CheckPointLocation/")\
    .start()

kafka.awaitTermination()

Below is the error,下面是错误,

pyspark.sql.utils.AnalysisException: cannot resolve '`key`' given input columns: [job.JOB_ID, 
job.JOB_TITLE, job.MAX_SALARY, job.MIN_SALARY]; line 1 pos 5;
'Project [unresolvedalias(cast('key as string), None), unresolvedalias(cast('value as string), None)]
+- Project [JOB_ID#0, JOB_TITLE#1, MIN_SALARY#2, MAX_SALARY#3]
   +- SubqueryAlias `job`
      +- StreamingRelation

DataSource(org.apache.spark.sql.SparkSession@1f3fc47a,csv,List(),Some(StructType(StructField(JOB_ID,StringType,true), StructField(JOB_TITLE,StringType,true), StructField(MIN_SALARY,StringType,true), StructField(MAX_SALARY,StringType,true))),List(),None,Map(sep ->,, header -> false, path -> C:/Hadoop/Data/Job*.csv),None), FileSource[C:/Hadoop/Data/Job*.csv], [JOB_ID#0, JOB_TITLE#1, MIN_SALARY#2, MAX_SALARY#3]

Tried converting the values into json, it worked perfectly.尝试将这些值转换为 json,效果很好。 Now am able to send the messages from spark stream to kafka,现在可以将消息从 spark stream 发送到 kafka,

kafka = data.selectExpr("CAST(JOB_ID AS STRING) AS key", "to_json(struct(*)) AS value")\

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM