简体   繁体   English

Spark - 将 JSON 数组对象转换为连接字符串

[英]Spark - convert JSON array object to concatenated string

How can I make this possible to my pyspark DF?我怎样才能使我的 pyspark DF 成为可能?

from an input json like this:从这样的输入json:

  {  "obj":[ 
          { 
             "a":"val1",
             "b":"val1"
          },
          { 
             "a":"val2",
             "b":"val2"
          }
          ]
 }

to a dataframe like this:到这样的数据框:

+---+---+----+----------+----+

|     a    |     b    |

+---+---+----+----------+----+

|val1, val2|val1, val2|

+---+---+----+----------+----+

Assuming that the content of your JSON file has been parsed as a Python dictionary, and assuming that there is only one "obj" key, you can easily convert your data structure into a standard 2D list, which can then be converted into any dataframe format you like:假设你的 JSON 文件的内容已经被解析为 Python 字典,并假设只有一个“obj”键,你可以很容易地将你的数据结构转换成标准的 2D 列表,然后可以转换成任何数据帧格式你喜欢:

json = {"obj":[{"a":"val1","b":"val1"},{"a":"val2","b":"val2"}]}

dic = {}
for row in json['obj']:
  for key,val in row.items():
    if key in dic:
      dic[key].append(val)
    else:
      dic[key] = [val]

table = list(dic.items())

# result:
# [('a', ['val1', 'val2']), ('b', ['val1', 'val2'])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM