简体   繁体   English

从 JSON 中提取少量字段并在 Pyspark Dataframe 中以地图的形式返回其余部分

[英]Extract few fields from JSON and return rest as a map in Pyspark Dataframe

I am reading a streaming data in pyspark dataframe, the data contains few fields which are present in every data/request.我正在读取 pyspark 数据帧中的流数据,该数据包含每个数据/请求中都存在的几个字段。 I want to exact those fields and create a dataframe column for it and want to store the rest of fields as map in another dataframe column.我想精确这些字段并为其创建一个数据框列,并希望将其余字段作为地图存储在另一个数据框列中。 I am not able to achieve it我无法实现

If someone can help with it?如果有人可以帮忙吗?

Example:例子:

Sample Values :样本值:

{"event1":"Value","event2":"Value","event3":"Value","event4":"Value","event5":"Value","event6":"Value"}
{"event1":"Value","event2":"Value","event3":"Value","data1":"Value","data2":"Value","data3":"Value"}

Now suppose event1,event2,event3 is present in every row, so I want to extract it and make it as a separate dataframe column and rest of the fields as map of key values pairs which will be another dataframe.现在假设每一行都存在 event1,event2,event3,所以我想将其提取并作为单独的数据框列,其余字段作为键值对的映射,这将是另一个数据框。

You need to create a schema for your dataframe and use from_json to convert it to StructType in spark.您需要为您的数据框创建一个模式并使用from_json将其转换为 Spark 中的StructType Then you are able to select your specific event's and create another dataframe for other events.然后,您可以选择您的特定事件并为其他事件创建另一个数据框。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM