I can't seem to create the correct regular expression to extract the correct tokens from my string. Padding the beginning of the string with a space generates the correct output, but seems less than optimal:
>>> import re
>>> s = '-edge_0triggered a-b | -level_Sensitive c-d | a-b-c'
>>> re.findall(r'\W(-[\w_]+)',' '+s)
['-edge_0triggered', '-level_Sensitive'] # correct output
Here are some of the regular expressions I've tried, does anyone have a regex suggestion that doesn't involve changing the original string and generates the correct output
>>> re.findall(r'(-[\w_]+)',s)
['-edge_0triggered', '-b', '-level_Sensitive', '-d', '-b', '-c']
>>> re.findall(r'\W(-[\w_]+)',s)
['-level_Sensitive']
Change the first qualifier to accept either a beginning anchor or a not-word, instead of only a not-word:
>>> re.findall(r'(?:^|\W)(-[\w_]+)', s)
['-edge_0triggered', '-level_Sensitive']
The ?:
at the beginning of the group simply tells the regex engine to not treat that as a group for purposes of results.
r'(?:^|\W)(-\w+)'
\\w
已经包含下划线。
You could use a negative-lookbehind:
re.findall(r'(?<!\w)(-\w+)', s)
the (?<!\\w)
part means "match only if not preceded by a word-character".
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.