I have text: "Johnny Alan Walker Sint Jansstraat 7, 1012 HG Amsterdam +123456789012"
Is is possible to find Lastname and phone? Exclude address? Address regex is this: "([AZ]{1,}[az]{1,}\\s){2}[0-9]{0,4}\\,\\s{1,}[0-9]{4}\\s[AZ]{2}\\s{1,}[a-zA-Z]{1,}"
(two words from capital, housenumber, comma, postal code and city)
I want result string to be "Walker +123456789012"
You could do....
\w+\s+\w+\s+(\w+).*(\+\d+)
And your capture groups should match up pretty well with what you're trying to match...
Essentially this will "disregard" your first and second "words" (first / middle name) and then disregard EVERYTHING from in between until it finds a + then captures the digits after it.
Live example: https://regex101.com/r/MjJCSv/1
In theory if your last name and your address will always be separated by more than 1 space you can shorten this a little bit and write it as
(\w+)\s{2,}.*(\+\d+)
Live example of this functionality: https://regex101.com/r/vGGB4z/1
Example implementation of the later in java: http://ideone.com/RExAEO
You can use the following to capture just the surname and the phone number.
The first part ( (\\w+\\s){3}
) will capture the 3 rd occurrence of a word followed by a space.
The second part ( .+?
) will capture everything
The third part ( (\\+?\\d+)$
) will capture an optional +
(phone number prefix) and the rest of the phone number, up to the end of the string.
(\w+\s){3}.+?(\+?\d+)$
\\1
- The surname \\2
- The phone number https://regex101.com/r/gqu0tt/4
But , IF the surname and the address is separated with more than 1 space, then you can use
(\w+)\s{2,}.+?(\+?\d+)$
\\1
- The surname \\2
- The phone number https://regex101.com/r/gqu0tt/5
I've tested these expressions on the Java engine , and they give back the correct match
This should do what you need, and also doesn't assume three names (works without a middle name present), so it's a little more flexible in case you run into entries for people who don't have a middle name:
.*?(\w+)\s*(?:[A-Z]{1,}[a-z]{1,}\s){2}[0-9]{0,4}\,\s{1,}[0-9]{4}\s[A-Z]{2}\s{1,}[a-zA-Z]{1,}\s*(\+\d+)
.*?(\\w+)\\s*
- Capture the last word before the whitespace before the address. .*?
will lazily match anything up to the word preceeding the address, but not capture. \\s*
will match the whitespace between the word and the address. (?:[AZ]{1,}[az]{1,}\\s){2}[0-9]{0,4}\\,\\s{1,}[0-9]{4}\\s[AZ]{2}\\s{1,}[a-zA-Z]{1,}
- your address regex but using a non-capturing group ( ?:
) \\s*(\\+\\d+)
- Captures the +
and following numbers. \\s*
will match the whitespace between the address and the +
. I reused your address regex, but made the capture group non-capturing. Then we match the last word before the address (the last name) using (\\w+)
, and the +
and following numbers after the address using (\\+\\d+)
.
Here it is in action: https://regex101.com/r/YGiaJT/1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.