简体   繁体   中英

Difference between re.sub and re.findall

I have strings which look like "Billboard Bill SpA". I want to have a regular expression that removes SpA, but only if there is a capitalised word before it. The regular expression I use is "[AZ][az]*\\s(SpA)". If I use re.sub both the SpA and the capitalised word before it get removed, which is expected.

re.sub("[A-Z][a-z]*\s(SpA)", "", "Billboard Bill SpA")
'Billboard '

However, if I use re.findall I get the functionality I need:

re.findall("[A-Z][a-z]*\s(SpA)", "Billboard Bill SpA")
['SpA']

I know I can write a pre expression with "?<=" which doesn't consume the pre text, but that works only for fixed length expressions. Anybody know what I can do to only remove "SpA" with re.sub, or make it work like re.findall?

To be more clear I want a regular expression to remove Spa, but only if there is a capitalized word before:

re.sub(regular_expresssion, "", "Billboard Bill SpA") -> Billboard Bill
re.sub(regular_expresssion, "", "to SpA") -> to SpA

Your re.sub is replacing the entire match, not just the group (SpA) . That's why it's also removing Bill . findall on the other hand is giving you the group.

In re.sub you can specify to include the part of the match that you don't want to delete.

re.sub("([A-Z][a-z]*\s)SpA", "\\1", "Billboard Bill SpA")
'Billboard Bill '

If you want to delete the space as well, move \\s outside of the parentheses.

Perform the substitution using groups.

>>> re.sub("([A-Z][a-z]*\s)(SpA)", "\\1", "Billboard Bill SpA")
'Billboard Bill '

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM