python正则表达式仅按某些顺序拆分一些字符串

Question

我有以下tokenizeAndParse（s）函数，该函数接受一个字符串并尝试将其令牌化为字符串数组

def tokenizeAndParse(s):
    tokens = re.split(r"(\s+|assign|:=|print|\+|if|while|{|}|;|[|]|,|@|for|true|false|call|procedure|not|and|or|\(|\))", s)
    tokens = [t for t in tokens if not t.isspace() and not t == ""]
    print("hello",tokens)

函数的一些例子

tokenizeAndParse("assign abc := [true, true, true];")
hello ['assign', 'abc', ':=', '[', 'true', ',', 'true', ',', 'true', ']', ';']

tokenizeAndParse("print 5+5;")
hello ['print', '5', '+', '5', ';']

我遇到了一个有趣的问题，如果我调用以下命令，则不会将4和]解析为单独的标记，我也不知道为什么。 如上所述，如果在[ ]之前为真，则该函数运行正常。

 tokenizeAndParse("assign abc := [true, true, 4];")
 hello ['assign', 'abc', ':=', '[', 'true', ',', 'true', ',', '4]', ';']

进一步使用该函数将表明，如果其在[ ]之前的数字，将无法正确解析。 这里发生了什么？

Answer 1

原因是您没有拆分数字。 替换下面的代码行：

tokens = re.split(r"(\s+|assign|:=|print|\+|if|while|{|}|;|[|]|,|@|for|true|false|call|procedure|not|and|or|\(|\))", s)

如下行所示：

>>> def tokenizeAndParse(s):
    tokens = re.split(r"(\s+|assign|:=|print|\+|if|while|{|}|;|[|]|,|@|for|true|false|call|procedure|not|and|or|\(|\)|[0-9]+)", s)
    tokens = [t for t in tokens if not t.isspace() and not t == ""]
    print("hello",tokens)

>>> tokenizeAndParse("assign abc := [true, true, 4];")
('hello', ['assign', 'abc', ':=', '[', 'true', ',', 'true', ',', '4', ']', ';'])

这样可以解决问题。

python正则表达式仅按某些顺序拆分一些字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-10-26 07:54:53

python正则表达式仅按某些顺序拆分一些字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-10-26 07:54:53

解决方案1
1 已采纳 2014-10-26 07:54:53