Spark - convert JSON array object to concatenated string

Question

How can I make this possible to my pyspark DF?

from an input json like this:

  {  "obj":[ 
          { 
             "a":"val1",
             "b":"val1"
          },
          { 
             "a":"val2",
             "b":"val2"
          }
          ]
 }

to a dataframe like this:

+---+---+----+----------+----+

|     a    |     b    |

+---+---+----+----------+----+

|val1, val2|val1, val2|

+---+---+----+----------+----+

Answer 1

Assuming that the content of your JSON file has been parsed as a Python dictionary, and assuming that there is only one "obj" key, you can easily convert your data structure into a standard 2D list, which can then be converted into any dataframe format you like:

json = {"obj":[{"a":"val1","b":"val1"},{"a":"val2","b":"val2"}]}

dic = {}
for row in json['obj']:
  for key,val in row.items():
    if key in dic:
      dic[key].append(val)
    else:
      dic[key] = [val]

table = list(dic.items())

# result:
# [('a', ['val1', 'val2']), ('b', ['val1', 'val2'])]

Spark - convert JSON array object to concatenated string

Question

1 answers

solution1
-1 2019-12-17 13:22:32

Spark - convert JSON array object to concatenated string

Question

1 answers

solution1 -1 2019-12-17 13:22:32

solution1
-1 2019-12-17 13:22:32