简体   繁体   中英

Regex uppercase words with condition

I'm new to regex and I can't figure it out how to do this:

Hello this is JURASSIC WORLD shut up Ok

[REVIEW] The movie BATMAN is awesome lol

What I need is the title of the movie. It will be only one per sentence. I have to ignore the words between [] as it will not be the title of the movie.

I thought of this:

^\w([A-Z]{2,})+

Any help would be welcome.

Thanks.

You can use negative look arounds to ensure that the title is not within []

\b(?<!\[)[A-Z ]{2,}(?!\])\b
  • \\b Matches word boundary.

  • (?<!\\[) Negative look behind. Checks if the matched string is not preceded by [

  • [AZ ]{2,} Matches 2 or more uppercase letters.

  • (?!\\]) Negative look ahead. Ensures that the string is not followed by ]

Example

>>> string = """Hello this is JURASSIC WORLD shut up Ok
... [REVIEW] The movie BATMAN is awesome lol"""
>>> re.findall(r'\b(?<!\[)[A-Z ]{2,}(?!\])\b', string)
[' JURASSIC WORLD ', ' BATMAN ']
>>>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM