[英]PySpark : Write Spark Dataframe to Kafka Topic
我正在嘗試將 dataframe 加載到 Kafka 主題。 選擇鍵和值時出錯。 任何建議都會有所幫助。
下面是我的代碼,
data = spark.sql('select * from job')
kafka = data.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")\
.writeStream.outputMode(outputMode='Append').format('kafka')\
.option("kafka.bootstrap.servers", "localhost:9092")\
.option("topic", "Jim_Topic")\
.option("checkpointLocation", "C:/Hadoop/Data/CheckPointLocation/")\
.start()
kafka.awaitTermination()
下面是錯誤,
pyspark.sql.utils.AnalysisException: cannot resolve '`key`' given input columns: [job.JOB_ID,
job.JOB_TITLE, job.MAX_SALARY, job.MIN_SALARY]; line 1 pos 5;
'Project [unresolvedalias(cast('key as string), None), unresolvedalias(cast('value as string), None)]
+- Project [JOB_ID#0, JOB_TITLE#1, MIN_SALARY#2, MAX_SALARY#3]
+- SubqueryAlias `job`
+- StreamingRelation
DataSource(org.apache.spark.sql.SparkSession@1f3fc47a,csv,List(),Some(StructType(StructField(JOB_ID,StringType,true), StructField(JOB_TITLE,StringType,true), StructField(MIN_SALARY,StringType,true), StructField(MAX_SALARY,StringType,true))),List(),None,Map(sep ->,, header -> false, path -> C:/Hadoop/Data/Job*.csv),None), FileSource[C:/Hadoop/Data/Job*.csv], [JOB_ID#0, JOB_TITLE#1, MIN_SALARY#2, MAX_SALARY#3]
嘗試將這些值轉換為 json,效果很好。 現在可以將消息從 spark stream 發送到 kafka,
kafka = data.selectExpr("CAST(JOB_ID AS STRING) AS key", "to_json(struct(*)) AS value")\
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.