I'm new to regex and I can't figure it out how to do this:
Hello this is JURASSIC WORLD shut up Ok
[REVIEW] The movie BATMAN is awesome lol
What I need is the title of the movie. It will be only one per sentence. I have to ignore the words between [] as it will not be the title of the movie.
I thought of this:
^\w([A-Z]{2,})+
Any help would be welcome.
Thanks.
You can use negative look arounds to ensure that the title is not within []
\b(?<!\[)[A-Z ]{2,}(?!\])\b
\\b
Matches word boundary.
(?<!\\[)
Negative look behind. Checks if the matched string is not preceded by [
[AZ ]{2,}
Matches 2 or more uppercase letters.
(?!\\])
Negative look ahead. Ensures that the string is not followed by ]
Example
>>> string = """Hello this is JURASSIC WORLD shut up Ok
... [REVIEW] The movie BATMAN is awesome lol"""
>>> re.findall(r'\b(?<!\[)[A-Z ]{2,}(?!\])\b', string)
[' JURASSIC WORLD ', ' BATMAN ']
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.