So, I've been cooking some regex, and it seems the regex library is capturing an extra new line when I use ((.|\\s)*)
to capture multi-line text.. [\\S\\s]*
works for some reason:
If you see below, the first regex produces an additional \\n
group, why?? :
>>> s = """
... #pragma whatever
... #pr
... asdfsadf
... #pragma START-SomeThing-USERCODE
... this is the code
... this is more
... #pragma END-SomeThing-USERCODE
... asd
... asdf
... sadf
... sdaf
... """
>>> r = r"(#pragma START-(.*)-USERCODE\s*\n)((.|\s)*)(#pragma END-(.*)-USERCODE)"
>>> re.findall(r, s) [('#pragma START-SomeThing-USERCODE\n', 'SomeThing', 'this is the code\nthis is more\n', '\n', '#pragma END-SomeThing-USERCODE', 'SomeThing')]
>>> r = r"(#pragma START-(.*)-USERCODE\s*\n)([\S\s]*)(#pragma END-(.*)-USERCODE)"
>>> re.findall(r, s) [('#pragma START-SomeThing-USERCODE\n', 'SomeThing', 'this is the code\nthis is more\n', '#pragma END-SomeThing-USERCODE', 'SomeThing')]
The subregex
((.|\s)*)
matches "this is the code\\nthis is more\\n"
. The outer parentheses capture this entire string.
The inner parentheses capture one character at a time (either any character besides newlines, or a space (including newline)). Since that group is repeated, the contents of the group are overwritten with each repetition. At the end of the match, the last character that was matched ( \\n
) is kept in that group.
So, if you want to avoid that, either make the inner group non-capturing:
((?:.|\s)*)
or use the ([\\s\\S]*)
idiom for matching truly any character. It might be a good idea to use ([\\s\\S]*?)
, though, to make sure that the smallest possible number of characters are matched.
This expression produces nested group
((.|\s)*)
Because you use nested braces. For single-character OR square braces is a proper choice; this syntax is suitable when you want to chose between 2 words
(treat|trick)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.