简体   繁体   English

将结构的字符串表示形式格式化为 python 字典

[英]Formatting string representation of structure to python dictionary

I need a little help processing a String to a Dict, considering that the String is not in a common format, but an output from a UDF function考虑到字符串不是通用格式,而是来自 UDF function 的 output,我需要一些帮助来处理字符串到字典

The return from the PySpark UDF looks like the string below: PySpark UDF 的返回类似于以下字符串:

"{list=[{a=1}, {a=2}, {a=3}]}"

And I need to convert it to a python dictionary with the structure below:我需要将其转换为 python 字典,其结构如下:

{
  "list": [
    {"a": 1}
    {"a": 2}
    {"a": 3}
  ]
}

So I can access it's values, like所以我可以访问它的值,比如

dict["list"][1]["a"]

I already tried using:我已经尝试过使用:

  • JSON.loads JSON.loads
  • ast_eval() ast_eval()

Could someone please help me?有人可以帮我吗?

As an example of how this unparsed string is generated:作为如何生成此未解析字符串的示例:

@udf()
def execute_method():
  return {"list": [{"a":1},{"b":1}{"c":1}]}

df_result = df_source.withColumn("result", execute_method())

By the very least you will need to replace = with : and surround keys with double quotes:至少您需要将=替换为:并用双引号将键括起来:

import json
import re

string = "{list=[{a=1}, {a=2}, {a=3}]}"
fixed_string = re.sub(r'(\w+)=', r'"\1":', string)
print(type(fixed_string), fixed_string)
parsed = json.loads(fixed_string)
print(type(parsed), parsed)

outputs输出

<class 'str'> {"list":[{"a":1}, {"a":2}, {"a":3}]}
<class 'dict'> {'list': [{'a': 1}, {'a': 2}, {'a': 3}]}

try this:尝试这个:

import re
import json  
data="{list=[{a=1}, {a=2}, {a=3}]}"
data=data.replace('=',':')
pattern=[e.group() for e in re.finditer('[a-z]+', data, flags=re.IGNORECASE)]
for e in set(pattern):
    data=data.replace(e,"\""+e+"\"")
print(json.loads(data))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM