[英]Spark - convert JSON array object to concatenated string
How can I make this possible to my pyspark DF?我怎样才能使我的 pyspark DF 成为可能?
from an input json like this:从这样的输入json:
{ "obj":[
{
"a":"val1",
"b":"val1"
},
{
"a":"val2",
"b":"val2"
}
]
}
to a dataframe like this:到这样的数据框:
+---+---+----+----------+----+
| a | b |
+---+---+----+----------+----+
|val1, val2|val1, val2|
+---+---+----+----------+----+
Assuming that the content of your JSON file has been parsed as a Python dictionary, and assuming that there is only one "obj" key, you can easily convert your data structure into a standard 2D list, which can then be converted into any dataframe format you like:假设你的 JSON 文件的内容已经被解析为 Python 字典,并假设只有一个“obj”键,你可以很容易地将你的数据结构转换成标准的 2D 列表,然后可以转换成任何数据帧格式你喜欢:
json = {"obj":[{"a":"val1","b":"val1"},{"a":"val2","b":"val2"}]}
dic = {}
for row in json['obj']:
for key,val in row.items():
if key in dic:
dic[key].append(val)
else:
dic[key] = [val]
table = list(dic.items())
# result:
# [('a', ['val1', 'val2']), ('b', ['val1', 'val2'])]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.