简体   繁体   English

使用带有特殊符号的re解析语法Python

[英]Parsing a line using re with special symbols Python

I am trying to parse a file like this: 我正在尝试解析这样的文件:

while (true){
    print("hello world")
}

while this is not Python syntax, i am using python for the parsing. 虽然这不是Python语法,但我正在使用python进行解析。 My code is: 我的代码是:

        words = []
        for line in lines: #line holds array of the above lines
            words += re.sub("[\s]", " ", line).split()

my result is: 我的结果是:

['while', '(true){', 'print("hello', 'world")', '}']

which is cool since I only used re with a [\\s] regex, but how would I get a result like this: 这很酷,因为我只将re与[\\ s]正则表达式一起使用,但是如何得到这样的结果:

['while', '(', 'true', ')', '{'....]

Where I get all symbols (lets assume I have a string that contains them one after the other, for example symbols = '(){}:,=+-') ? 我在哪里得到所有符号(假设我有一个包含一个接一个的字符串,例如symbol ='(){}:,= +-')?

You can use re.split with a group to get the split text and the split characters. 您可以对组使用re.split来获取拆分文本和拆分字符。

For instance, a symbol can be matched with the r'\\W+' RegEx. 例如,符号可以与r'\\W+' RegEx匹配。

Here is an example: 这是一个例子:

import re

code = """\
while (true){
    print("hello world")
}
"""

for line in code.splitlines():
    print(re.split(r"(\W+)", line))

You'll get: 你会得到:

['', '    ', 'while', ' (', 'true', '){', '']
['', '        ', 'print', '("', 'hello', ' ', 'world', '")', '']
['', '    }', '']
['', '    ', '']

With a filtering, you can drop empty string… 通过过滤,您可以删除空字符串…

Or, if you need to match one-character symbols, you cantry: 或者,如果您需要匹配一个字符的符号,则可以尝试:

for line in code.splitlines():
    tokens = [token for token in re.split(r"(\W)", line) if token.strip()]
    print(tokens)

You get: 你得到:

['while', '(', 'true', ')', '{']
['print', '(', '"', 'hello', 'world', '"', ')']
['}']
[]

Try this: 尝试这个:

import re

re1 = r'(.?)([(){}:,=+-]{1})(.?)'

lines = '''
while (true){
    print("hello world")
}
'''

words = []
for line in lines.split('\n'): #line holds array of the above lines
    cleanLine = re.sub(re1, '\g<1> \g<2> \g<3>', line)
    words += re.sub("[\s]", " ", cleanLine).split()}

print(words)
# ['while', '(', 'true', ')', '{', 'print', '(', '"hello', 'world"', ')', '}']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM