[英]How to parse a string that looks like JSON with lots of embedded classes in python?
I have a string that lists the properties of a request event. 我有一个字符串,列出了请求事件的属性。
My string looks like: 我的字符串看起来像:
requestBody: {
propertyA = 1
propertyB = 2
propertyC = {
propertyC1 = 1
propertyC2 = 2
}
propertyD = [
{ propertyD1 = { propertyD11 = 1}},
{ propertyD1 = [ {propertyD21 = 1, propertyD22 = 2},
{propertyD21 = 3, propertyD22 = 4}]}
]
}
I have tried to replace the "="
with ":"
so that I can put it into a JSON reader in python, but JSON also requires that key and value are stored in string with double quotes and a ","
to separate each KV pair. 我尝试用
":"
替换"="
,以便将其放入python中的JSON阅读器中,但是JSON还要求将键和值存储在字符串中,并用双引号和一个","
来分隔每个KV对。 This then became a little complicated to implement. 因此,实现起来有点复杂。 What are some better approaches to parsing this into python dictionary with exactly the same structure (eg embedded dictionaries are also preserved)?
有什么更好的方法将其解析为结构完全相同的python字典 (例如,还保留了嵌入式字典)?
Question: If I were to write a full parser, what's the main pattern that I should tackle? 问题:如果我要编写一个完整的解析器,应该处理的主要模式是什么? Storing parenthesis in a stack until the parenthesis completes?
将括号存储在堆栈中,直到括号完成?
This is a nice case for using pyparsing, especially since it adds the issue of recursive structuring. 这是使用pyparsing的一个很好的例子,特别是因为它增加了递归结构化的问题。
The short answer is the following parser (processes everything after the leading requestBody :
): 简短的答案是以下解析器(处理前导
requestBody :
之后的所有内容):
LBRACE,RBRACE,LBRACK,RBRACK,EQ = map(Suppress, "{}[]=")
NL = LineEnd().setName("NL")
# define special delimiter for lists and objects, since they can be
# comma-separated or just newline-separated
list_delim = NL | ','
list_delim.leaveWhitespace()
# use a parse action to convert numeric values to ints or floats at parse time
def convert_number(t):
try:
return int(t[0])
except ValueError:
return float(t[0])
number = Word(nums, nums+'.').addParseAction(convert_number)
qs = quotedString
# forward-declare value, since it will be defined recursively
obj_value = Forward()
ident = Word(alphas, alphanums+'_')
obj_property = Group(ident + EQ + obj_value)
# use Dict wrapper to auto-define nested properties as key-values
obj = Group(LBRACE + Dict(Optional(delimitedList(obj_property, delim=list_delim))) + RBRACE)
obj_array = Group(LBRACK + Optional(delimitedList(obj, delim=list_delim)) + RBRACK)
# now assign to previously-declared obj_value, using '<<=' operator
obj_value <<= obj_array | obj | number | qs
# parse the data
res = obj.parseString(sample)[0]
# convert the result to a dict
import pprint
pprint.pprint(res.asDict())
gives 给
{'propertyA': 1,
'propertyB': 2,
'propertyC': {'propertyC1': 1, 'propertyC2': 2},
'propertyD': {'propertyD1': {'propertyD11': 1},
'propertyD2': {'propertyD21': 3, 'propertyD22': 4}}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.