[英]python regular expression splitting some strings only in certain orders
I have the following tokenizeAndParse(s) function that takes a string and attempts to tokenize it to an array of strings 我有以下tokenizeAndParse(s)函数,该函数接受一个字符串并尝试将其令牌化为字符串数组
def tokenizeAndParse(s):
tokens = re.split(r"(\s+|assign|:=|print|\+|if|while|{|}|;|[|]|,|@|for|true|false|call|procedure|not|and|or|\(|\))", s)
tokens = [t for t in tokens if not t.isspace() and not t == ""]
print("hello",tokens)
Some examples of the the function 函数的一些例子
tokenizeAndParse("assign abc := [true, true, true];")
hello ['assign', 'abc', ':=', '[', 'true', ',', 'true', ',', 'true', ']', ';']
tokenizeAndParse("print 5+5;")
hello ['print', '5', '+', '5', ';']
I am running into an interesting problem, if I call the following, the 4 and the ] aren't parsed as separate tokens and I have no idea why. 我遇到了一个有趣的问题,如果我调用以下命令,则不会将4和]解析为单独的标记,我也不知道为什么。 As demonstrated above, if it is true before the ] the function works fine.
如上所述,如果在[ ]之前为真 ,则该函数运行正常。
tokenizeAndParse("assign abc := [true, true, 4];")
hello ['assign', 'abc', ':=', '[', 'true', ',', 'true', ',', '4]', ';']
further playing with the function demonstrates that if its a number before the ] , it will not parse correctly. 进一步使用该函数将表明,如果其在[ ]之前的数字,将无法正确解析。 What is going on here?
这里发生了什么?
The reason is that you are not splitting on numbers. 原因是您没有拆分数字。 Replace below code line:
替换下面的代码行:
tokens = re.split(r"(\s+|assign|:=|print|\+|if|while|{|}|;|[|]|,|@|for|true|false|call|procedure|not|and|or|\(|\))", s)
as shown in the below lines: 如下行所示:
>>> def tokenizeAndParse(s):
tokens = re.split(r"(\s+|assign|:=|print|\+|if|while|{|}|;|[|]|,|@|for|true|false|call|procedure|not|and|or|\(|\)|[0-9]+)", s)
tokens = [t for t in tokens if not t.isspace() and not t == ""]
print("hello",tokens)
>>> tokenizeAndParse("assign abc := [true, true, 4];")
('hello', ['assign', 'abc', ':=', '[', 'true', ',', 'true', ',', '4', ']', ';'])
This will fix the issue. 这样可以解决问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.