writing json record from dataframe column to S3 in spark streaming

Question

I have a drataframe shown in below format with records as json data (which is in string format) read from kafka topic

I need to write just the json records present in dataframe to S3.

Is there any way where I can parse the records and convert json to dataframe and write to s3?

or any other solutions provided will be helpfull

I have tried to use foreach but could not convert to dataframe to write to s3

def foreach_function(self,row):
   print("*"*100)
   print(row[0])
        
query = df.writeStream.foreach(self.foreach_function).start()
query.awaitTermination()

Answer 1

Unclear why you need Spark for this.

Kafka Connect is part of Kafka, so only need to configure it to use S3 Kafka Connect sink (which is open source), and it supports writing JSON files.

writing json record from dataframe column to S3 in spark streaming

Question

1 answers

solution1
0 2023-02-01 05:38:28

writing json record from dataframe column to S3 in spark streaming

Question

1 answers

solution1 0 2023-02-01 05:38:28

solution1
0 2023-02-01 05:38:28