简体   繁体   中英

Spark - convert JSON array object to concatenated string

How can I make this possible to my pyspark DF?

from an input json like this:

  {  "obj":[ 
          { 
             "a":"val1",
             "b":"val1"
          },
          { 
             "a":"val2",
             "b":"val2"
          }
          ]
 }

to a dataframe like this:

+---+---+----+----------+----+

|     a    |     b    |

+---+---+----+----------+----+

|val1, val2|val1, val2|

+---+---+----+----------+----+

Assuming that the content of your JSON file has been parsed as a Python dictionary, and assuming that there is only one "obj" key, you can easily convert your data structure into a standard 2D list, which can then be converted into any dataframe format you like:

json = {"obj":[{"a":"val1","b":"val1"},{"a":"val2","b":"val2"}]}

dic = {}
for row in json['obj']:
  for key,val in row.items():
    if key in dic:
      dic[key].append(val)
    else:
      dic[key] = [val]

table = list(dic.items())

# result:
# [('a', ['val1', 'val2']), ('b', ['val1', 'val2'])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM