Extract few fields from JSON and return rest as a map in Pyspark Dataframe

Question

I am reading a streaming data in pyspark dataframe, the data contains few fields which are present in every data/request. I want to exact those fields and create a dataframe column for it and want to store the rest of fields as map in another dataframe column. I am not able to achieve it

If someone can help with it?

Example:

Sample Values :

{"event1":"Value","event2":"Value","event3":"Value","event4":"Value","event5":"Value","event6":"Value"}
{"event1":"Value","event2":"Value","event3":"Value","data1":"Value","data2":"Value","data3":"Value"}

Now suppose event1,event2,event3 is present in every row, so I want to extract it and make it as a separate dataframe column and rest of the fields as map of key values pairs which will be another dataframe.

Answer 1

You need to create a schema for your dataframe and use from_json to convert it to StructType in spark. Then you are able to select your specific event's and create another dataframe for other events.

Extract few fields from JSON and return rest as a map in Pyspark Dataframe

Question

1 answers

solution1
0 2022-06-15 11:02:14

Extract few fields from JSON and return rest as a map in Pyspark Dataframe

Question

1 answers

solution1 0 2022-06-15 11:02:14

solution1
0 2022-06-15 11:02:14