简体   繁体   English

如何使用Python中的许多嵌入式类来解析类似于JSON的字符串?

[英]How to parse a string that looks like JSON with lots of embedded classes in python?

I have a string that lists the properties of a request event. 我有一个字符串,列出了请求事件的属性。

My string looks like: 我的字符串看起来像:

requestBody: {
    propertyA = 1
    propertyB = 2
    propertyC = {
        propertyC1 = 1
        propertyC2 = 2
    }
    propertyD = [
        { propertyD1 = { propertyD11 = 1}},
        { propertyD1 = [ {propertyD21 = 1, propertyD22 = 2}, 
                         {propertyD21 = 3, propertyD22 = 4}]}
    ]
}

I have tried to replace the "=" with ":" so that I can put it into a JSON reader in python, but JSON also requires that key and value are stored in string with double quotes and a "," to separate each KV pair. 我尝试用":"替换"=" ,以便将其放入python中的JSON阅读器中,但是JSON还要求将键和值存储在字符串中,并用双引号和一个","来分隔每个KV对。 This then became a little complicated to implement. 因此,实现起来有点复杂。 What are some better approaches to parsing this into python dictionary with exactly the same structure (eg embedded dictionaries are also preserved)? 有什么更好的方法将其解析为结构完全相同的python字典 (例如,还保留了嵌入式字典)?

Question: If I were to write a full parser, what's the main pattern that I should tackle? 问题:如果我要编写一个完整的解析器,应该处理的主要模式是什么? Storing parenthesis in a stack until the parenthesis completes? 将括号存储在堆栈中,直到括号完成?

This is a nice case for using pyparsing, especially since it adds the issue of recursive structuring. 这是使用pyparsing的一个很好的例子,特别是因为它增加了递归结构化的问题。

The short answer is the following parser (processes everything after the leading requestBody : ): 简短的答案是以下解析器(处理前导requestBody :之后的所有内容):

LBRACE,RBRACE,LBRACK,RBRACK,EQ = map(Suppress, "{}[]=")
NL = LineEnd().setName("NL")

# define special delimiter for lists and objects, since they can be
# comma-separated or just newline-separated
list_delim = NL | ','
list_delim.leaveWhitespace()

# use a parse action to convert numeric values to ints or floats at parse time
def convert_number(t):
    try:
        return int(t[0])
    except ValueError:
        return float(t[0])
number = Word(nums, nums+'.').addParseAction(convert_number)

qs = quotedString

# forward-declare value, since it will be defined recursively
obj_value = Forward()

ident = Word(alphas, alphanums+'_')
obj_property = Group(ident + EQ + obj_value)

# use Dict wrapper to auto-define nested properties as key-values
obj = Group(LBRACE + Dict(Optional(delimitedList(obj_property, delim=list_delim))) + RBRACE)

obj_array = Group(LBRACK + Optional(delimitedList(obj, delim=list_delim)) + RBRACK)

# now assign to previously-declared obj_value, using '<<=' operator
obj_value <<= obj_array | obj | number | qs

# parse the data
res = obj.parseString(sample)[0]

# convert the result to a dict
import pprint
pprint.pprint(res.asDict())

gives

{'propertyA': 1,
 'propertyB': 2,
 'propertyC': {'propertyC1': 1, 'propertyC2': 2},
 'propertyD': {'propertyD1': {'propertyD11': 1},
               'propertyD2': {'propertyD21': 3, 'propertyD22': 4}}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM