简体   繁体   中英

regular expression pattern match excluding a list of cases

I wanna match the street name pattern which consists of several capital case words excluding some cases but I do not know how to do it.

The pattern is "([AZ][az]+ {1,3})" (Let's assume the name of a street consists of 1-3 words) and a short version block list is ["Apt","West","East"] which denotes either direction or room number.

Any word that is in the list("West" for example) should not be in the match result. Words starting with those words in block list however("Westmoreland" for example), should be in the result. How am i gonna write this regular expression?

You may use

\b(?!(?:Apt|West|East)\b)[A-Z][a-z]+(?: (?!(?:Apt|West|East)\b)[A-Z][a-z]+){0,2}

See the regex demo

What I did:

  • Fixed your regex to actually match 1 to 3 words: [AZ][az]+(?: [AZ][az]+){0,2}
  • Added negative lookaheads to restrict the values matched by [AZ][az]+ parts.

Expression details :

  • \\b(?!(?:Apt|West|East)\\b)[AZ][az]+ - a capital ASCII letter ( [AZ] ) followed with 1+ ASCII lowercase letters ( [az] but I guess you can also use [a-zA-Z]+ or [a-zA-Z]* here) that are not a whole word Apt , West or East that is made possible with the negative lookahead anchored at the \\b word boundary. The first \\b is a leading word boundary, and then the negative lookahead makes sure there are no Apt , West or East right after the word boundary, and before a trailing \\b word boundary (ensuring a whole word match)
  • (?: (?!(?:Apt|West|East)\\b)[AZ][az]+){0,2} - 0 to 2 occurrences of:
    • - a space
    • (?!(?:Apt|West|East)\\b)[AZ][az]+ - see above. You do not need a leading word boundary here as the Apt , West or East can only appear after a space here, which is a non-word char.

A lot of people would post a shorter solution like

(?: ?\b(?!(?:Apt|West|East)\b)[A-Z][a-z]+){1,3}

See the demo

However, the optional space at the start would also match this leading space. Morever, the regex does not match linearly now, and that affects performance. With small strings, it is OK, but still it is bad practice.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM