简体   繁体   中英

Python re.sub behaves differently than re.findall

I'm stumped. I'm coding Python 3.6.2, using PyCharm as my IDE. The following script fragment illustrates my problem:

def dosubst(m):
    return m.group() + "X"

line = r"set @message = formatmessage('%s %s', @arg1, @arg2);"
m = re.findall(r"@\w+\b", line, re.IGNORECASE)
print(m[0])  # prints "@message"
print(m[1])  # prints "@arg1"
print(m[2])  # prints "@arg2"

foo = re.sub(r"@\w+\b", dosubst, line, re.IGNORECASE)
print(foo)  # prints "set @messageX = formatmessage('%s %s', @arg1X, @arg2);"

You can see that re.findall finds three matches. However, re.sub only calls the dosubst function twice . If I change @message to message then re.sub still calls dosubst twice, but picks up @arg1 and @arg2 . Baffled. I thought it might be greedy vs. posessive, etc. but - changing @message to message and the resulting behavior negates that. Can anyone explain? I'm trying to do some basic text parsing of SQL to refactor message formatting for a large number of files. I use regexr.com to prototype most of the regex stuff I do and it also finds three occurrences of the pattern in the line. Thanks.

See the documentation . The fourth argument to re.sub is count , not flags . Since re.IGNORECASE happens to be 2, you are telling it to only do two substitutions. Instead, pass flags by keyword:

>>> re.sub(r"@\w+\b", dosubst, line, flags=re.IGNORECASE)
"set @messageX = formatmessage('%s %s', @arg1X, @arg2X);"

By giving the fourth argument count=0 . If you put the other positive numbers instead of 0 than it will replace the string exactly the same number of time.

foo = re.sub(r"@\w+\b", dosubst, line, 0, re.IGNORECASE)
print(foo)

output:

"set @MessageX = formatmessage('%s %s', @arg1X, @arg2X);"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM