I'm stumped. I'm coding Python 3.6.2, using PyCharm as my IDE. The following script fragment illustrates my problem:
def dosubst(m):
return m.group() + "X"
line = r"set @message = formatmessage('%s %s', @arg1, @arg2);"
m = re.findall(r"@\w+\b", line, re.IGNORECASE)
print(m[0]) # prints "@message"
print(m[1]) # prints "@arg1"
print(m[2]) # prints "@arg2"
foo = re.sub(r"@\w+\b", dosubst, line, re.IGNORECASE)
print(foo) # prints "set @messageX = formatmessage('%s %s', @arg1X, @arg2);"
You can see that re.findall
finds three matches. However, re.sub
only calls the dosubst
function twice . If I change @message
to message
then re.sub
still calls dosubst
twice, but picks up @arg1
and @arg2
. Baffled. I thought it might be greedy vs. posessive, etc. but - changing @message
to message
and the resulting behavior negates that. Can anyone explain? I'm trying to do some basic text parsing of SQL to refactor message formatting for a large number of files. I use regexr.com to prototype most of the regex stuff I do and it also finds three occurrences of the pattern in the line. Thanks.
See the documentation . The fourth argument to re.sub
is count
, not flags
. Since re.IGNORECASE
happens to be 2, you are telling it to only do two substitutions. Instead, pass flags
by keyword:
>>> re.sub(r"@\w+\b", dosubst, line, flags=re.IGNORECASE)
"set @messageX = formatmessage('%s %s', @arg1X, @arg2X);"
By giving the fourth argument count=0
. If you put the other positive numbers instead of 0
than it will replace the string exactly the same number of time.
foo = re.sub(r"@\w+\b", dosubst, line, 0, re.IGNORECASE)
print(foo)
output:
"set @MessageX = formatmessage('%s %s', @arg1X, @arg2X);"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.