简体   繁体   中英

writing json record from dataframe column to S3 in spark streaming

I have a drataframe shown in below format with records as json data (which is in string format) read from kafka topic

在此处输入图像描述

I need to write just the json records present in dataframe to S3.

Is there any way where I can parse the records and convert json to dataframe and write to s3?

or any other solutions provided will be helpfull

I have tried to use foreach but could not convert to dataframe to write to s3

def foreach_function(self,row):
   print("*"*100)
   print(row[0])
        
query = df.writeStream.foreach(self.foreach_function).start()
query.awaitTermination()

Unclear why you need Spark for this.

Kafka Connect is part of Kafka, so only need to configure it to use S3 Kafka Connect sink (which is open source), and it supports writing JSON files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM