![](/img/trans.png)
[英]How to query on data frame where 1 field of StringType has json value in Spark SQL
[英]Extract json data from StringType Spark.SQL
有單列字符串類型的配置單元表。
hive> desc logical_control.test1; OK test_field_1 string test field 1
val df2 = spark.sql("select * from logical_control.test1")
df2.printSchema()
root |-- test_field_1: string (nullable = true)
df2.show(false)
+------------------------+ |test_field_1 | +------------------------+ |[[str0], [str1], [str2]]| +------------------------+
如何將其轉換為如下所示的結構化列?
root |-- A: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- S: string (nullable = true)
我嘗試使用初始模式恢復它,該模式在將數據寫入 hdfs 之前進行了結構化。 但是 json_data 為空。
val schema = StructType(
Seq(
StructField("A", ArrayType(
StructType(
Seq(
StructField("S", StringType, nullable = true))
)
), nullable = true)
)
)
val df3 = df2.withColumn("json_data", from_json(col("test_field_1"), schema))
df3.printSchema()
root |-- test_field_1: string (nullable = true) |-- json_data: struct (nullable = true) | |-- A: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- S: string (nullable = true)
df3.show(false)
+------------------------+---------+ |test_field_1 |json_data| +------------------------+---------+ |[[str0], [str1], [str2]]|null | +------------------------+---------+
如果test_field_1
的結構是固定的,並且您不介意自己“解析”字段,則可以使用udf來執行轉換:
case class S(S:String)
def toArray: String => Array[S] = _.replaceAll("[\\[\\]]","").split(",").map(s => S(s.trim))
val toArrayUdf = udf(toArray)
val df3 = df2.withColumn("json_data", toArrayUdf(col("test_field_1")))
df3.printSchema()
df3.show(false)
印刷
root
|-- test_field_1: string (nullable = true)
|-- json_data: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- S: string (nullable = true)
+------------------------+------------------------+
|test_field_1 |json_data |
+------------------------+------------------------+
|[[str0], [str1], [str2]]|[[str0], [str1], [str2]]|
+------------------------+------------------------+
棘手的部分是創建結構的第二級( element: struct
)。 我已經使用案例類S
來創建這個結構。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.