I would like to match all strings where all words are capitilized.
At the moment I have tried something like this:
list = ["This sentence should Not Match", "This Should Only Match"]
match = []
for l in list:
x = re.search("^[A-Z]*.", l)
if x:
match.append(l)
For example I would like the regex to match only something like: "This Is A Good Example Here", but it should not match: "Something like this Here", "HERE Is an example that Should NOT Match", "TiHiS SeNtEnEcE" or "This Should NOT Match.Foo"
I am looping over lots of news articles and trying to match all the titles. These titles usually have every word capitalized.
You can do without regex:
l = ["This sentence should Not Match", "This Should Only Match"]
[s for s in l if s.istitle()]
Output:
['This Should Only Match']
Try matching using re.search
with the following pattern:
^[A-Z][a-z]*(?: [A-Z][a-z]*)*$
Script:
list = ["This sentence should Not Match", "This Should Only Match"]
matches = []
for l in list:
x = re.search("^[A-Z][a-z]*(?: [A-Z][a-z]*)*$", l)
if x:
matches.append(l)
print(matches)
This prints:
['This Should Only Match']
I support Chris' solution foremost, but here's a possible regex approach:
import re
sentences = ["This sentence should Not Match", "This Should Only Match"]
result = [x for x in sentences if re.match(r"^([A-Z][a-z]*\b\s*)+$", x)]
print(result) # => ["This Should Only Match"]
The regex only matches strings with one or more of a single capital letter followed by 0 or more lowercase letters, a word boundary and optional spaces.
Note: try to avoid overwriting the builtin function list()
and it's a good habit to always make regex strings raw.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.