[英]Create JSON column in Spark Scala
I have some data that needs to be written as a JSON string after some transformations in a spark (+scala) job.在 spark (+scala) 作业中进行一些转换后,我有一些数据需要写成 JSON 字符串。 I'm using the to_json function along with struct and/or array function in order to build the final json that is requested.我正在使用 to_json 函数以及结构和/或数组函数来构建请求的最终 json。
I have one piece of the json that looks like:我有一段 json 看起来像:
"field":[
"foo",
{
"inner_field":"bar"
}
]
I'm not an expert in JSON, so I don't know if this structure is usual or not, all I know is that this is a valid JSON format.我不是 JSON 专家,所以我不知道这种结构是否常见,我只知道这是一种有效的 JSON 格式。 I'm having trouble to create a dataframe column with this format and I want to know what is the best way to create this type of data columns.我在使用这种格式创建数据框列时遇到问题,我想知道创建此类数据列的最佳方法是什么。
Thanks in advance提前致谢
If you have a dataframe with a bunch of columns you want to turn into a json string column, you can make use of the to_json
and the struct
functions.如果你有一个包含一堆列的数据框,你想把它变成一个 json 字符串列,你可以使用to_json
和struct
函数。 Something like this:像这样:
import org.apache.spark.sql.types._
val df = Seq(
(1, "string1", Seq("string2", "string3")),
(2, "string4", Seq("string5", "string6"))
).toDF("colA", "colB", "colC")
df.show
+----+-------+------------------+
|colA| colB| colC|
+----+-------+------------------+
| 1|string1|[string2, string3]|
| 2|string4|[string5, string6]|
+----+-------+------------------+
val newDf = df.withColumn("jsonString", to_json(struct($"colA", $"colB", $"colC")))
newDf.show(false)
+----+-------+------------------+--------------------------------------------------------+
|colA|colB |colC |jsonString |
+----+-------+------------------+--------------------------------------------------------+
|1 |string1|[string2, string3]|{"colA":1,"colB":"string1","colC":["string2","string3"]}|
|2 |string4|[string5, string6]|{"colA":2,"colB":"string4","colC":["string5","string6"]}|
+----+-------+------------------+--------------------------------------------------------+
struct
makes a single StructType column from multiple columns and to_json
turns them into a json string. struct
从多个列中生成单个 StructType 列,然后to_json
将它们转换为 json 字符串。
Hope this helps!希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.