简体   繁体   中英

Regular expression to find string with iterating letters on the end

Can someone help me with this kind of regular expression matching?

For example, I'm searching through list containing different strings with a letter iterating at the end of the string:

  • MonsterA
  • MonsterB
  • MonsterC
  • HeroA
  • HeroB
  • HeroC
  • ...

What I need this script to return is only the preceding part of the string, in this example Monster and Hero .

If you absolutely need a regex:

re.match(r"(.*)[A-Z]", word).group(1)

But it is not the most efficient if you just want to remove the last character.

You could use a positive lookahead assertion (?=...) to check the words ends in a single uppercase character and then use word boudaries \\b...\\b to ensure it does not match patterns that arent whole words:

>>> text = "This re will match MonsterA and HeroB but not heroC or MonsterCC"
>>> re.findall(r"\b[A-Z][a-z]+(?=[A-Z]\b)", text)
['Monster', 'Hero'] 

re.findall returns all such matches in a list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM