简体   繁体   English

在 Spark Scala 中创建 JSON 列

[英]Create JSON column in Spark Scala

I have some data that needs to be written as a JSON string after some transformations in a spark (+scala) job.在 spark (+scala) 作业中进行一些转换后,我有一些数据需要写成 JSON 字符串。 I'm using the to_json function along with struct and/or array function in order to build the final json that is requested.我正在使用 to_json 函数以及结构和/或数组函数来构建请求的最终 json。

I have one piece of the json that looks like:我有一段 json 看起来像:

"field":[
    "foo",
    {
        "inner_field":"bar"
    }
]

I'm not an expert in JSON, so I don't know if this structure is usual or not, all I know is that this is a valid JSON format.我不是 JSON 专家,所以我不知道这种结构是否常见,我只知道这是一种有效的 JSON 格式。 I'm having trouble to create a dataframe column with this format and I want to know what is the best way to create this type of data columns.我在使用这种格式创建数据框列时遇到问题,我想知道创建此类数据列的最佳方法是什么。

Thanks in advance提前致谢

If you have a dataframe with a bunch of columns you want to turn into a json string column, you can make use of the to_json and the struct functions.如果你有一个包含一堆列的数据框,你想把它变成一个 json 字符串列,你可以使用to_jsonstruct函数。 Something like this:像这样:

import org.apache.spark.sql.types._

val df = Seq(
  (1, "string1", Seq("string2", "string3")),
  (2, "string4", Seq("string5", "string6"))
  ).toDF("colA", "colB", "colC")

df.show                                                                                                                                                                                                                                                                  
+----+-------+------------------+                                                                                                                                                                                                                                               
|colA|   colB|              colC|                                                                                                                                                                                                                                               
+----+-------+------------------+                                                                                                                                                                                                                                               
|   1|string1|[string2, string3]|                                                                                                                                                                                                                                               
|   2|string4|[string5, string6]|                                                                                                                                                                                                                                               
+----+-------+------------------+

val newDf = df.withColumn("jsonString", to_json(struct($"colA", $"colB", $"colC")))

newDf.show(false)                                                                                                                                                                                                                                                        
+----+-------+------------------+--------------------------------------------------------+                                                                                                                                                                                      
|colA|colB   |colC              |jsonString                                              |                                                                                                                                                                                      
+----+-------+------------------+--------------------------------------------------------+                                                                                                                                                                                      
|1   |string1|[string2, string3]|{"colA":1,"colB":"string1","colC":["string2","string3"]}|                                                                                                                                                                                      
|2   |string4|[string5, string6]|{"colA":2,"colB":"string4","colC":["string5","string6"]}|                                                                                                                                                                                      
+----+-------+------------------+--------------------------------------------------------+

struct makes a single StructType column from multiple columns and to_json turns them into a json string. struct从多个列中生成单个 StructType 列,然后to_json将它们转换为 json 字符串。

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM