How can I make this possible to my pyspark DF?
from an input json like this:
{ "obj":[
{
"a":"val1",
"b":"val1"
},
{
"a":"val2",
"b":"val2"
}
]
}
to a dataframe like this:
+---+---+----+----------+----+
| a | b |
+---+---+----+----------+----+
|val1, val2|val1, val2|
+---+---+----+----------+----+
Assuming that the content of your JSON file has been parsed as a Python dictionary, and assuming that there is only one "obj" key, you can easily convert your data structure into a standard 2D list, which can then be converted into any dataframe format you like:
json = {"obj":[{"a":"val1","b":"val1"},{"a":"val2","b":"val2"}]}
dic = {}
for row in json['obj']:
for key,val in row.items():
if key in dic:
dic[key].append(val)
else:
dic[key] = [val]
table = list(dic.items())
# result:
# [('a', ['val1', 'val2']), ('b', ['val1', 'val2'])]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.