简体   繁体   中英

matching whitespace seems to be non-greedy

I'm writing a small helper script to analyse C code, especially the use of structs. I have problems detecting when a struct is used as a value as opposed to a pointer. That means I want to detect if the text struct foo is followed by an arbitrary amount of whitespace and a character that is not * .

I boiled my problem down to this MWE:

>>> import re
>>> there = re.compile('struct foo(\\s*)[^*]')
>>> match = there.search('struct foo *bar')

Note. I need to use the double backslash because I cannot use raw strings in my application. I actually need an f-string.

The MWE should not produce a match in my book. However, it does and if I look at match.groups() , I get

>>> match.groups()
('',)

meaning that \\\\s* did match zero whitespace characters. From the documentation I would have expected it to match the single space before *foo in my string as the * quantifier should match zero or more characters greedily.

Exchanging \\\\s with [ \\t] or even * (note the space) does not make a difference either.

Why does \\\\s* seem to match zero characters in presence of a space?

I think you just want to make sure that the final character group doesn't match space characters. So you want:

struct foo(\\s*)[^*\\s]

I would use this regular expression:

(?:struct foo\s*)([^*\s]+)

This will return you what comes after the spaces if no asterisk is provided.

Example: struct foo *bar would return nothing.
struct foo bar would return bar .

Test and explanation here: https://regex101.com/r/dVeHc3/1

(\\\\s*) is correctly matching zero spaces. The [^*] can't match against the * in the text, so it should match against the previous character, which is the only available space that (\\\\s*) would have matched against.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM