简体   繁体   中英

Regex - find string by excluding part of it

I have text: "Johnny Alan Walker Sint Jansstraat 7, 1012 HG Amsterdam +123456789012"

Is is possible to find Lastname and phone? Exclude address? Address regex is this: "([AZ]{1,}[az]{1,}\\s){2}[0-9]{0,4}\\,\\s{1,}[0-9]{4}\\s[AZ]{2}\\s{1,}[a-zA-Z]{1,}" (two words from capital, housenumber, comma, postal code and city)

I want result string to be "Walker +123456789012"

You could do....

\w+\s+\w+\s+(\w+).*(\+\d+)

And your capture groups should match up pretty well with what you're trying to match...

Essentially this will "disregard" your first and second "words" (first / middle name) and then disregard EVERYTHING from in between until it finds a + then captures the digits after it.

Live example: https://regex101.com/r/MjJCSv/1

In theory if your last name and your address will always be separated by more than 1 space you can shorten this a little bit and write it as

(\w+)\s{2,}.*(\+\d+)

Live example of this functionality: https://regex101.com/r/vGGB4z/1

Example implementation of the later in java: http://ideone.com/RExAEO

You can use the following to capture just the surname and the phone number.

The first part ( (\\w+\\s){3} ) will capture the 3 rd occurrence of a word followed by a space.

The second part ( .+? ) will capture everything

The third part ( (\\+?\\d+)$ ) will capture an optional + (phone number prefix) and the rest of the phone number, up to the end of the string.

(\w+\s){3}.+?(\+?\d+)$
  • \\1 - The surname
  • \\2 - The phone number

https://regex101.com/r/gqu0tt/4

But , IF the surname and the address is separated with more than 1 space, then you can use

(\w+)\s{2,}.+?(\+?\d+)$
  • \\1 - The surname
  • \\2 - The phone number

https://regex101.com/r/gqu0tt/5


I've tested these expressions on the Java engine , and they give back the correct match

This should do what you need, and also doesn't assume three names (works without a middle name present), so it's a little more flexible in case you run into entries for people who don't have a middle name:

.*?(\w+)\s*(?:[A-Z]{1,}[a-z]{1,}\s){2}[0-9]{0,4}\,\s{1,}[0-9]{4}\s[A-Z]{2}\s{1,}[a-zA-Z]{1,}\s*(\+\d+)
  • .*?(\\w+)\\s* - Capture the last word before the whitespace before the address. .*? will lazily match anything up to the word preceeding the address, but not capture. \\s* will match the whitespace between the word and the address.
  • (?:[AZ]{1,}[az]{1,}\\s){2}[0-9]{0,4}\\,\\s{1,}[0-9]{4}\\s[AZ]{2}\\s{1,}[a-zA-Z]{1,} - your address regex but using a non-capturing group ( ?: )
  • \\s*(\\+\\d+) - Captures the + and following numbers. \\s* will match the whitespace between the address and the + .

I reused your address regex, but made the capture group non-capturing. Then we match the last word before the address (the last name) using (\\w+) , and the + and following numbers after the address using (\\+\\d+) .

Here it is in action: https://regex101.com/r/YGiaJT/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM