Scala spark将数据框中的一组列聚合为JSON字符串

Question

Given a data frame,给定一个数据框，

+-----------------------------+
| id|  name| payable| strategy|
+-----------------------------+
|  0|   Joe|     100|     st-1|
|  1|   Tom|     200|     st-2|
|  2|  John|     300|     st-1|
+-----------------------------+

What would be the most efficient way to convert each row to a JSON string such as follows,将每一行转换为 JSON 字符串的最有效方法是什么，如下所示，

{
  "payload": {
     "name": "Joe",
     "payments": [
         {
            "strategy": "st-1",
            "payable": 100
         }
     ]
  }
}

Currently I have UDF to manually stringify the provided columns, but I'm wondering whether there is a better way to achieve this.目前我有 UDF 来手动对提供的列进行字符串化，但我想知道是否有更好的方法来实现这一点。 The to_json method is the best alternative I found so far but that takes only one column as an input. to_json方法是迄今为止我发现的最好的替代方法，但它只需要一列作为输入。

Answer 1

Using to_json() is the correct approach, but the contents need to be passed as struct as appropriate:使用to_json()是正确的方法，但需要根据需要将内容作为struct传递：

val df = Seq((0,"Joe",100,"st-1"), (1,"Tom",200,"st-2")).toDF("id","name","payable","strategy")

val result = df.select(
  to_json(struct(
    struct($"name",
      array(struct($"strategy",$"payable")) as "payments"
    ) as "payload")
  ) as "jsonValue"
 )

result.show(false)
+-------------------------------------------------------------------------+
|jsonValue                                                                |
+-------------------------------------------------------------------------+
|{"payload":{"name":"Joe","payments":[{"strategy":"st-1","payable":100}]}}|
|{"payload":{"name":"Tom","payments":[{"strategy":"st-2","payable":200}]}}|
+-------------------------------------------------------------------------+

Scala spark将数据框中的一组列聚合为JSON字符串

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-02-20 09:59:55

Scala spark将数据框中的一组列聚合为JSON字符串

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-02-20 09:59:55

解决方案1
3 已采纳 2020-02-20 09:59:55