[英]Formatting string representation of structure to python dictionary
I need a little help processing a String to a Dict, considering that the String is not in a common format, but an output from a UDF function考虑到字符串不是通用格式,而是来自 UDF function 的 output,我需要一些帮助来处理字符串到字典
The return from the PySpark UDF looks like the string below: PySpark UDF 的返回类似于以下字符串:
"{list=[{a=1}, {a=2}, {a=3}]}"
And I need to convert it to a python dictionary with the structure below:我需要将其转换为 python 字典,其结构如下:
{
"list": [
{"a": 1}
{"a": 2}
{"a": 3}
]
}
So I can access it's values, like所以我可以访问它的值,比如
dict["list"][1]["a"]
I already tried using:我已经尝试过使用:
Could someone please help me?有人可以帮我吗?
As an example of how this unparsed string is generated:作为如何生成此未解析字符串的示例:
@udf()
def execute_method():
return {"list": [{"a":1},{"b":1}{"c":1}]}
df_result = df_source.withColumn("result", execute_method())
By the very least you will need to replace =
with :
and surround keys with double quotes:至少您需要将
=
替换为:
并用双引号将键括起来:
import json
import re
string = "{list=[{a=1}, {a=2}, {a=3}]}"
fixed_string = re.sub(r'(\w+)=', r'"\1":', string)
print(type(fixed_string), fixed_string)
parsed = json.loads(fixed_string)
print(type(parsed), parsed)
outputs输出
<class 'str'> {"list":[{"a":1}, {"a":2}, {"a":3}]}
<class 'dict'> {'list': [{'a': 1}, {'a': 2}, {'a': 3}]}
try this:尝试这个:
import re
import json
data="{list=[{a=1}, {a=2}, {a=3}]}"
data=data.replace('=',':')
pattern=[e.group() for e in re.finditer('[a-z]+', data, flags=re.IGNORECASE)]
for e in set(pattern):
data=data.replace(e,"\""+e+"\"")
print(json.loads(data))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.