将结构的字符串表示形式格式化为 python 字典

Question

I need a little help processing a String to a Dict, considering that the String is not in a common format, but an output from a UDF function考虑到字符串不是通用格式，而是来自 UDF function 的 output，我需要一些帮助来处理字符串到字典

The return from the PySpark UDF looks like the string below: PySpark UDF 的返回类似于以下字符串：

"{list=[{a=1}, {a=2}, {a=3}]}"

And I need to convert it to a python dictionary with the structure below:我需要将其转换为 python 字典，其结构如下：

{
  "list": [
    {"a": 1}
    {"a": 2}
    {"a": 3}
  ]
}

So I can access it's values, like所以我可以访问它的值，比如

dict["list"][1]["a"]

I already tried using:我已经尝试过使用：

JSON.loads JSON.loads
ast_eval() ast_eval()

Could someone please help me?有人可以帮我吗？

As an example of how this unparsed string is generated:作为如何生成此未解析字符串的示例：

@udf()
def execute_method():
  return {"list": [{"a":1},{"b":1}{"c":1}]}

df_result = df_source.withColumn("result", execute_method())

Answer 1

By the very least you will need to replace = with : and surround keys with double quotes:至少您需要将=替换为:并用双引号将键括起来：

import json
import re

string = "{list=[{a=1}, {a=2}, {a=3}]}"
fixed_string = re.sub(r'(\w+)=', r'"\1":', string)
print(type(fixed_string), fixed_string)
parsed = json.loads(fixed_string)
print(type(parsed), parsed)

outputs输出

<class 'str'> {"list":[{"a":1}, {"a":2}, {"a":3}]}
<class 'dict'> {'list': [{'a': 1}, {'a': 2}, {'a': 3}]}

Answer 2

try this:尝试这个：

import re
import json  
data="{list=[{a=1}, {a=2}, {a=3}]}"
data=data.replace('=',':')
pattern=[e.group() for e in re.finditer('[a-z]+', data, flags=re.IGNORECASE)]
for e in set(pattern):
    data=data.replace(e,"\""+e+"\"")
print(json.loads(data))

将结构的字符串表示形式格式化为 python 字典

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-03-19 20:59:52

解决方案2
0 2021-03-19 21:32:19

将结构的字符串表示形式格式化为 python 字典

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-03-19 20:59:52

解决方案2 0 2021-03-19 21:32:19

解决方案1
2 已采纳 2021-03-19 20:59:52

解决方案2
0 2021-03-19 21:32:19