简体   繁体   中英

Python Regex match only where every word is capitalized

I would like to match all strings where all words are capitilized.

At the moment I have tried something like this:

list = ["This sentence should Not Match", "This Should Only Match"]
match = []
for l in list:
   x = re.search("^[A-Z]*.", l)
   if x:
      match.append(l)

For example I would like the regex to match only something like: "This Is A Good Example Here", but it should not match: "Something like this Here", "HERE Is an example that Should NOT Match", "TiHiS SeNtEnEcE" or "This Should NOT Match.Foo"

I am looping over lots of news articles and trying to match all the titles. These titles usually have every word capitalized.

You can do without regex:

l = ["This sentence should Not Match", "This Should Only Match"]
[s for s in l if s.istitle()]

Output:

['This Should Only Match']

Try matching using re.search with the following pattern:

^[A-Z][a-z]*(?: [A-Z][a-z]*)*$

Script:

list = ["This sentence should Not Match", "This Should Only Match"]
matches = []
for l in list:
    x = re.search("^[A-Z][a-z]*(?: [A-Z][a-z]*)*$", l)
    if x:
        matches.append(l)

print(matches)

This prints:

['This Should Only Match']

I support Chris' solution foremost, but here's a possible regex approach:

import re

sentences = ["This sentence should Not Match", "This Should Only Match"]
result = [x for x in sentences if re.match(r"^([A-Z][a-z]*\b\s*)+$", x)]
print(result) # => ["This Should Only Match"]

The regex only matches strings with one or more of a single capital letter followed by 0 or more lowercase letters, a word boundary and optional spaces.

Note: try to avoid overwriting the builtin function list() and it's a good habit to always make regex strings raw.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM