简体   繁体   中英

Regex: select until first space or comma occurrence

I have following example of american addresses.

6301 Stonewood Dr Apt-728, Plano TX-75024 
13323 Maham Road, Apt # 1621, Dallas, TX 75240
17040 Carlson Drive, #1027 Parker, CO 80134
3465 25th St., San Francisco, CA 94110 

I want to extract city from using regex

Plano, Dallas, Parker, San Francisco

I am using following regex which is working for first example

(?<=[,.|•]).*\s+(?=[\s,.]?CA?[\s,.-]?[\d]{4,})

can you help me for the same as?

You can use

,(?:\s*#\d+)?\s*([^\s,][^,]*)(?=\W+[A-Z]{2}\W*\d{4,}\s*$)

See the regex demo . The necessary value is in Group 1.

Details :

  • , - a comma
  • (?:\s*#\d+)? - an optional sequence of zero or more whitespaces, # and then one or more digits
  • \s* - zero or more whitespaces
  • ([^\s,][^,]*) - Group 1: a char other than whitespace and comma and then zero or more non-comma chars
  • (?=\W+[AZ]{2}\W*\d{4,}\s*$) - a positive lookahead that requires (immediately on the right)
    • \W+ - one or more non-word chars
    • [AZ]{2} - two uppercase ASCII letters
    • \W* - zero or more non-word chars
    • \d{4,} - gfour or more digits
    • \s* - zero or more whitespaces
    • $ - end of string.

You can match the comma, then all except AZ and capture from the first occurrence of AZ.

,[^A-Z,]*?\b([A-Z][^,]*?),?\s*[A-Z]{2}[-\s]\d{4,}\s*$

Explanation

  • ,[^AZ,]*?\b Match a comma, then any char except AZ or a comma till a word boundary
  • ([AZ][^,]*?) Capture group 1 Match AZ and then any char except a comma as least as possible
  • ,?\s*[AZ]{2} match optional comma, optional whiteapace chars and 2 uppecase chars AZ
  • [-\s]\d{4,}\s* Match either - or a whitespace char and then 4 or more digits followed by optional whiteapace chars
  • $ end of string

Regex demo

The best guess I could achieve was starting from the end of the string looking for a chain of non-spacing characters (being the portion you are looking for) followed by a space, a chain of capital letters, then an option space/dash and in the end a chain of numbers.

([^\s]+?)\,?\s[AZ]+[\s\-]?\d+$

Being the first group, the target you are aiming for.

This is a live example with your use case embedded:

https://regexr.com/6nkq5

(as a side note, the demo on regexr may tell you the expression took more than 250ms and can't render.. you just slightly edit the test case to make it update and show you the actual result)

另一种方法(假设结尾的结构或多或少是固定的)

.+\s(\w+?),?.{4}\d{4,}

As long as your match comes always after the (exactly) two country letters, you can use that simple condition to match your city.

(?<= )[A-Za-z ]+(?=,? [A-Z]{2})

Your match [A-Za-z ]+ will be found between

  • (?<= ) : a space and
  • (?=,? [AZ]{2}) : an optional comma + a space + two uppercase letters

Check the demo here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM