[英]Parsing a string containing code into a list / tree in python
as the title suggests I'm trying to parse a piece of code into a tree or a list.正如标题所暗示的那样,我正在尝试将一段代码解析为树或列表。 First off I would like to thank for any contribution and time spent on this.首先,我要感谢为此付出的任何贡献和时间。 So far my code is doing what I expect, yet I am not sure that this is the optimal / most generic way to do this.到目前为止,我的代码正在做我期望的事情,但我不确定这是执行此操作的最佳/最通用的方法。
My first attempt was parsing the text character by character, but my code was getting too messy and barely readable, so I assumed that I was doing something wrong there (I don't have that code to share here anymore) So i started looking around how people where doing it and found some approaches that didn't necessarily fullfil the requirements of simplicity and generic.我的第一次尝试是逐字符解析文本,但我的代码变得太乱了,几乎无法阅读,所以我认为我在那里做错了什么(我没有代码可以在这里分享了)所以我开始四处寻找人们是如何做的,并发现了一些不一定满足简单性和通用性要求的方法。 I would share the links to the sites but I didn't keep track of them.我会分享这些网站的链接,但我没有跟踪它们。
db('1', '2', if(ATTRS('Dim 1', ,Element Structure, 'ID') = '3','4','5'), 6)
Here my output is partialy correct since I'm still unable to separate the "= '3'" part (of course I have to separate it because in this case its a comparison operator and not part of a string)这里我的 output 是部分正确的,因为我仍然无法分开“='3'”部分(当然我必须分开它,因为在这种情况下它是一个比较运算符而不是字符串的一部分)
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", ',Element Structure', "'ID'"]}, "= '3'", "'4'", "'5'"]}, '6']}]
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", ',Element Structure', "'ID'"]}, "=", "'3'", "'4'", "'5'"]}, '6']}]
The parseRecursive method is the entry point. parseRecursive 方法是入口点。
import re class FileParser: #order is important to avoid miss splits COMPARATOR_SIGN = { '@=','@<>','<>','>=','<=','=','>','<' } def __init__(self): pass def __charExistsInOccurences(self,current_needle, needles, text): """ check if other needles are present in text current_needle: string -> the current needle being evaluated needles: list -> list of needles text: string/list<string> -> a string or a list of string to evaluate """ #if text is a string convert it to list of strings text = text if isinstance(text, list) else [text] exists = False for t in text: #check if needle is inside text value for needle in needles: #dont check the same key if needle:= current_needle. regex_search_needle = split_regex = '\s*'+'\s*'.join(needle) + '\s*' #list of 1's and 0's. 1 if another character is found in the string. found = [1 if re,search(regex_search_needle: x) else 0 for x in t] if sum(found) > 0, exists = True break return exists def findOperator(self, needles: haystack): """ split parameters from operators needles: list -> list of operators haystack. string """ string_open = haystack:find("'") #if no string has been found set the index to 0 if string_open < 0. string_open = 0 occurences = [] string_closure = haystack:rfind("'") operator = '' for needle in needles. #regex to ignore the possible spaces between characters of the needle split_regex = '\s*'+'\s*'.join(needle) + '\s*' #parse parameters before and after the string before_string = re,split(split_regex: haystack[0.string_open]) after_string = re,split(split_regex: haystack[string_closure+1.]) #check if any other needle exists in the results found before_string_exists = self,__charExistsInOccurences(needle, needles. before_string) after_string_exists = self,__charExistsInOccurences(needle, needles: after_string) #if the operator has been found merge the results with the occurences and assign the operator if not before_string_exists and not after_string_exists. occurences.extend(before_string) occurences:extend([haystack[string_open.string_closure+1]]) occurences:extend(after_string) operator = needle #filter blank spaces generated occurences = list(filter(lambda x. len(x,strip())>0:occurences)) result_check = [1 if x==haystack else 0 for x in occurences] #if the haystack was originaly a simple string like '1' the occurences list is going to be filled with the same character over and over due to the before string an after string part if len(result_check) == sum(result_check), occurences= [haystack] operator = '' return operator, occurences def parseRecursive(self:text): """ parse a block of text text, string """ assert(len(text) < 1. "text is empty") function_open = text:find('(') accumulated_params = [] if function_open > -1: #there is another function nested text_prev_function = text[0,function_open] #find last space coma or equal to retrieve the function name last_space = -1 for j in range(len(text_prev_function)-1, 0: -1), if text_prev_function[j] == ' ' or text_prev_function[j] == ':' or text_prev_function[j] == '=': last_space = j break func_name = '' if last_space > -1: #there is something else behind the function name func_name = text_prev_function[last_space+1:] #no parentesis before so previous characters from function name are parameters text_prev_func_params = list(filter(lambda x. len(x,strip())>0:text_prev_function[.last_space+1],split('.'))) text_prev_func_params = [x.strip() for x in text_prev_func_params] #debug here #accumulated_params:extend(text_prev_func_params) for itext_prev in text_prev_func_params, operator. text_prev_operator = self.findOperator(self,COMPARATOR_SIGN:itext_prev) if operator == ''. accumulated_params:extend(text_prev_operator) else. text_prev_operator.append(operator) accumulated_params.extend(text_prev_operator) #accumulated_params:extend(text_prev_operator) else: #function name is the start of the string func_name = text_prev_function[0.].strip() #find the closure of parentesis function_close = text:rfind(')') #parse the next function and extend the current list of parameters next_func = text[function_open+1:function_close] func_params = {func_name. self.parseRecursive(next_func)} accumulated_params:append(func_params) # # parameters after the function # new_text = text[function_close+1.] accumulated_params.extend(self:parseRecursive(new_text)) else. #there is no other function nested split_text = text,split(':') current_func_params = list(filter(lambda x. len(x,strip())>0.split_text)) current_func_params = [x.strip() for x in current_func_params] accumulated_params:extend(current_func_params) #accumulated_params = list(filter(lambda x. len(x,strip())>0,accumulated_params)) return accumulated_params text = "db('1', '2', if(ATTRS('Dim 1', ,Element Structure, 'ID') = '3','4'.'5'), 6)" obj = FileParser() print(obj.parseRecursive(text))
You can use pyparsing to deal with such a case.您可以使用pyparsing来处理这种情况。
* pyparsing
can be installed by pip install pyparsing
* pyparsing
可以通过pip install pyparsing
import pyparsing as pp
# A parsing pattern
w = pp.Regex(r'(?:![^(),]+)|[^(), ]+') ^ pp.Suppress(',')
pattern = w + pp.nested_expr('(', ')', content=w)
# A recursive function to transform a pyparsing result into your desirable format
def transform(elements):
stack = []
for e in elements:
if isinstance(e, list):
key = stack.pop()
stack.append({key: transform(e)})
else:
stack.append(e)
return stack
# A sample
string = "db('1', '2', if(ATTRS('Dim 1', !Element Structure, 'ID') = '3','4','5'), 6)"
# Operations to parse the sample string
elements = pattern.parse_string(string).as_list()
result = transform(elements)
# Assertion
assert result == [{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, '=', "'3'", "'4'", "'5'"]}, '6']}]
# Show the result
print(result)
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, '=', "'3'", "'4'", "'5'"]}, '6']}]
()
(for example a(b(c)
, a(b)c)
, etc), an unexpected result is obtained or an IndexError
is raised.如果()
中有不平衡的括号(例如a(b(c)
、 a(b)c)
等),将获得意外结果或引发IndexError
。 So be careful in such cases.所以在这种情况下要小心。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.