简体   繁体   中英

Python Three Letter Acronyms

I am trying to check if a certain string contains an acronym using regex.

my current regex:

re.search(r'\b[A-Z]{3}', string)

currently it outputs true to USA , NYCs , and NSFW but it should not say true on NSFW because it is a four letter acronym, not three.

How can I readjust the regex to make it not accept NSFW , but still accept NYCs

EDIT: it should also accept NYC,

A negative lookahead assertion: (?!pattern)

re.search(r'\\b[AZ]{3}(?![AZ])',string)

This requires the triple capital pattern to never be followed by another capital letter, while it doesn't imply other restrictions, like the pattern necessarily be followed by something. Think "Not followed by P" vs "Followed by not P"

Try:

filter(re.compile(r'\b[A-Z]{3}(?![A-Z])').search, ['.ANS', 'ANSs', 'AANS', 'ANS.'])
>>> import re
>>> rexp = r'(?:\b)([A-Z]{3})(?:$|[^A-Z])'
>>> re.search(rexp, 'USA').groups()
('USA',)
>>> re.search(rexp, 'NSFW') is None
True
>>> re.search(rexp, 'aUSA') is None
True
>>> re.search(rexp, 'NSF,').groups()
('NSF',)

You can use the ? to mean a character is optional, {0,1} would be equivalent.

You can put whatever characters you want to match inside the square brackets [ ] it will match any one of those 0 or 1 times so NYC. or WINs or FOO, will match.

Add the $ to the end to specify no more characters after the match are allowed

re.search(r'\b[A-Z]{3}[s,.]?$', string)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM