简体   繁体   English

正则表达式解析python中的选项字符串

[英]regular expression to parse option string in python

I can't seem to create the correct regular expression to extract the correct tokens from my string. 我似乎无法创建正确的正则表达式来从字符串中提取正确的标记。 Padding the beginning of the string with a space generates the correct output, but seems less than optimal: 用空格填充字符串的开头会生成正确的输出,但似乎不是最优的:

>>> import re
>>> s = '-edge_0triggered a-b | -level_Sensitive c-d | a-b-c'
>>> re.findall(r'\W(-[\w_]+)',' '+s)
['-edge_0triggered', '-level_Sensitive'] # correct output

Here are some of the regular expressions I've tried, does anyone have a regex suggestion that doesn't involve changing the original string and generates the correct output 这是我尝试过的一些正则表达式,是否有人提出不涉及更改原始字符串并生成正确输出的正则表达式建议

>>> re.findall(r'(-[\w_]+)',s)
['-edge_0triggered', '-b', '-level_Sensitive', '-d', '-b', '-c']
>>> re.findall(r'\W(-[\w_]+)',s)
['-level_Sensitive']

Change the first qualifier to accept either a beginning anchor or a not-word, instead of only a not-word: 更改第一个限定词以接受开始的锚或一个非单词,而不是一个非单词:

>>> re.findall(r'(?:^|\W)(-[\w_]+)', s)
['-edge_0triggered', '-level_Sensitive']

The ?: at the beginning of the group simply tells the regex engine to not treat that as a group for purposes of results. 组开头的?:只是告诉正则表达式引擎不要出于结果目的将其视为一个组。

r'(?:^|\W)(-\w+)'

\\w已经包含下划线。

You could use a negative-lookbehind: 您可以在后面使用否定式:

re.findall(r'(?<!\w)(-\w+)', s)

the (?<!\\w) part means "match only if not preceded by a word-character". (?<!\\w)部分的意思是“仅在不带单词字符的情况下匹配”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM