I want to write a nested data structure consisting of a Map inside another Map using an array of a Scala case class.
The result should transform this dataframe:
|Value|Country| Timestamp| Sum|
+-----+-------+----------+----+
| 123| ITA|1475600500|18.0|
| 123| ITA|1475600516|19.0|
+-----+-------+----------+----+
into:
+--------------------------------------------------------------------+
|value |
+--------------------------------------------------------------------+
[{"value":123,"attributes":{"ITA":{"1475600500":18,"1475600516":19}}}]
+--------------------------------------------------------------------+
The actualResult
dataset below gets me close but the structure isn't quite the same as my expected dataframe.
case class Record(value: Integer, attributes: Map[String, Map[String, BigDecimal]])
val actualResult = df
.map(r =>
Array(
Record(
r.getAs[Int]("Value"),
Map(
r.getAs[String]("Country") ->
Map(
r.getAs[String]("Timestamp") -> new BigDecimal(
r.getAs[Double]("Sum").toString
)
)
)
)
)
)
The Timestamp
column in the actualResult
dataset doesn't get combined together into the same Record
row but rather creates two separate rows instead.
+----------------------------------------------------+
|value |
+----------------------------------------------------+
[{"value":123,"attributes":{"ITA":{"1475600516":19}}}]
[{"value":123,"attributes":{"ITA":{"1475600500":18}}}]
+----------------------------------------------------+
With the use of groupBy
and collect_list
by creatng combined column using struct I was able to get single row as below output.
val mycsv =
"""
|Value|Country|Timestamp|Sum
| 123|ITA|1475600500|18.0
| 123|ITA|1475600516|19.0
""".stripMargin('|').lines.toList.toDS()
val df: DataFrame = spark.read.option("header", true)
.option("sep", "|")
.option("inferSchema", true)
.csv(mycsv)
df.show
val df1 = df.
groupBy("Value","Country")
.agg( collect_list(struct(col("Country"), col("Timestamp"), col("Sum"))).alias("attributes")).drop("Country")
val json = df1.toJSON // you can save in to file
json.show(false)
Result combined 2 rows
+-----+-------+----------+----+
|Value|Country| Timestamp| Sum|
+-----+-------+----------+----+
|123.0|ITA |1475600500|18.0|
|123.0|ITA |1475600516|19.0|
+-----+-------+----------+----+
+----------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|{"Value":123.0,"attributes":[{"Country":"ITA","Timestamp":1475600500,"Sum":18.0},{"Country":"ITA","Timestamp":1475600516,"Sum":19.0}]}|
+----------------------------------------------------------------------------------------------------------------------------------------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.