简体   繁体   中英

Extract few fields from JSON and return rest as a map in Pyspark Dataframe

I am reading a streaming data in pyspark dataframe, the data contains few fields which are present in every data/request. I want to exact those fields and create a dataframe column for it and want to store the rest of fields as map in another dataframe column. I am not able to achieve it

If someone can help with it?

Example:

Sample Values :

{"event1":"Value","event2":"Value","event3":"Value","event4":"Value","event5":"Value","event6":"Value"}
{"event1":"Value","event2":"Value","event3":"Value","data1":"Value","data2":"Value","data3":"Value"}

Now suppose event1,event2,event3 is present in every row, so I want to extract it and make it as a separate dataframe column and rest of the fields as map of key values pairs which will be another dataframe.

You need to create a schema for your dataframe and use from_json to convert it to StructType in spark. Then you are able to select your specific event's and create another dataframe for other events.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM