[英]Convert a Spark's Data-frame's Json column to Array of Object
I have a dataframe with JSON column.我有一个 dataframe 和 JSON 列。 JSON basically contains array of key and value as in below example.
JSON 基本上包含键和值的数组,如下例所示。
Col1
=====================================================================
|{“Name”:”Ram”,”Place”:”RamGarh”} |
|{“Name”:”Lakshman”,”Place”:”LakshManPur”.”DepartMent”:”Operations”} |
|{“Name”:”Sita”,”Place”:”SitaPur”,”Experience”,”14”} |
I need to parse this JSON data.我需要解析这个 JSON 数据。 What should be most efficient way?
什么应该是最有效的方法?
I need to present it form of我需要呈现它的形式
case class dfCol(col:String, valu:String)
So basically I need to parse json of every row of that dataframe and convert in form所以基本上我需要解析 dataframe 的每一行的 json 并转换为形式
| Col
| ==========================================================
| Array(dfCol(Name,Ram),dfCOl(Place,Ramgarh))
| Array(dfCol(Name,Lakshman),dfCOl(Place,LakshManPur),dfCOl(DepartMent,Operations))
| Array(dfCol(Name,Sita),dfCOl(Place,SitaPur),dfCOl(Experience,14))
Use this -用这个 -
case class dfCol(col:String, valu:String)
val data =
"""
|{"Name":"Ram","Place":"RamGarh"}
|{"Name":"Lakshman","Place":"LakshManPur","DepartMent":"Operations"}
|{"Name":"Sita","Place":"SitaPur","Experience":14.0}
""".stripMargin
val df = spark.read.json(data.split(System.lineSeparator()).toSeq.toDS())
df.show(false)
df.printSchema()
/**
* +----------+----------+--------+-----------+
* |DepartMent|Experience|Name |Place |
* +----------+----------+--------+-----------+
* |null |null |Ram |RamGarh |
* |Operations|null |Lakshman|LakshManPur|
* |null |14.0 |Sita |SitaPur |
* +----------+----------+--------+-----------+
*
* root
* |-- DepartMent: string (nullable = true)
* |-- Experience: double (nullable = true)
* |-- Name: string (nullable = true)
* |-- Place: string (nullable = true)
*/
Row -> Array[dfCol]
Row -> Array[dfCol]
val ds: Dataset[Array[dfCol]] = df.map(row => {
row.getValuesMap[String](row.schema.map(_.name))
.filter(_._2 != null)
.map{f => dfCol(f._1, String.valueOf(f._2))}
.toArray
})
ds.show(false)
ds.printSchema()
// +------------------------------------------------------------------+
//|value |
//+------------------------------------------------------------------+
//|[[Name, Ram], [Place, RamGarh]] |
//|[[DepartMent, Operations], [Name, Lakshman], [Place, LakshManPur]]|
//|[[Experience, 14.0], [Name, Sita], [Place, SitaPur]] |
//+------------------------------------------------------------------+
//
//root
// |-- value: array (nullable = true)
// | |-- element: struct (containsNull = true)
// | | |-- col: string (nullable = true)
// | | |-- valu: string (nullable = true)
Check below code.检查下面的代码。
scala> import org.apache.spark.sql.types._
scala> val schema = MapType[StringType,StringType]
scala> df.show(false)
+-------------------------------------------------------------------+
|col1 |
+-------------------------------------------------------------------+
|{"Name":"Ram","Place":"RamGarh"} |
|{"Name":"Lakshman","Place":"LakshManPur","DepartMent":"Operations"}|
|{"Name":"Sita","Place":"SitaPur","Experience":"14"} |
+-------------------------------------------------------------------+
scala>
df
.withColumn("id",monotonically_increasing_id)
.select(from_json($"col1",schema).as("col1"),$"id")
.select(explode($"col1"),$"id")
.groupBy($"id")
.agg(collect_list(struct($"key",$"value")).as("col1"))
.select("col1")
.show(false)
+------------------------------------------------------------------+
|col1 |
+------------------------------------------------------------------+
|[[Name, Ram], [Place, RamGarh]] |
|[[Name, Lakshman], [Place, LakshManPur], [DepartMent, Operations]]|
|[[Name, Sita], [Place, SitaPur], [Experience, 14]] |
+------------------------------------------------------------------+
scala> df.withColumn("id",monotonically_increasing_id).select(from_json($"col1",schema).as("col1"),$"id").select(explode($"col1"),$"id").groupBy($"id").agg(collect_list(struct($"key",$"value")).as("col1")).select("col1").printSchema
root
|-- col1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: string (nullable = false)
| | |-- value: string (nullable = true)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.