![](/img/trans.png)
[英]Flatten any nested json string and convert to dataframe using spark scala
[英]Convert Nested JSON into a DataFrame using Spark/Scala
我有一个嵌套的JSON,我需要将其转换为扁平化的DataFrame,而无需在其中定义或展开任何列名。
val df = sqlCtx.read.option("multiLine",true).json("test.json")
这就是我的数据:
[
{
"symbol": “TEST3",
"timestamp": "2019-05-07 16:00:00",
"priceData": {
"open": "1177.2600",
"high": "1179.5500",
"low": "1176.6700",
"close": "1179.5500",
"volume": "49478"
}
},
{
"symbol": “TEST4",
"timestamp": "2019-05-07 16:00:00",
"priceData": {
"open": "189.5660",
"high": "189.9100",
"low": "189.5100",
"close": "189.9100",
"volume": "267986"
}
}
]
下面是使用单向DataFrameFlattener
通过实施类Databricks :
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.types.{DataType, StructType}
implicit class DataFrameFlattener(df: DataFrame) {
def flattenSchema: DataFrame = {
df.select(flatten(Nil, df.schema): _*)
}
protected def flatten(path: Seq[String], schema: DataType): Seq[Column] = schema match {
case s: StructType => s.fields.flatMap(f => flatten(path :+ f.name, f.dataType))
case other => col(path.map(n => s"`$n`").mkString(".")).as(path.mkString(".")) :: Nil
}
}
df.flattenSchema.show
并输出:
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
|priceData.close|priceData.high|priceData.low|priceData.open|priceData.volume|symbol| timestamp|
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
| 1179.5500| 1179.5500| 1176.6700| 1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660| 267986| TEST4|2019-05-07 16:00:00|
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
或者,您可以执行常规选择:
df.select(
"priceData.close",
"priceData.high",
"priceData.low",
"priceData.open",
"priceData.volume",
"symbol",
"timestamp").show
输出:
+---------+---------+---------+---------+------+------+-------------------+
| close| high| low| open|volume|symbol| timestamp|
+---------+---------+---------+---------+------+------+-------------------+
|1179.5500|1179.5500|1176.6700|1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660|267986| TEST4|2019-05-07 16:00:00|
+---------+---------+---------+---------+------+------+-------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.