简体   繁体   中英

Regex giving extra output in java

My code is like:

String try1 = " how abcd is a lake 3909 Witmer Road Niagara Falls NY 14305 and our adress is 120, 5th cross, 1st main, domlur, Bangalore 50071 nad 420, Fanboy Lane, NewYark, AS 12345";
String add1="( \\b+[0-9]{3,5}[, ]* (.*)[, ]* (.*)[, ]* [a-zA-Z]{2} [0-9]{5})";
Pattern p = Pattern.compile(add1);
Matcher m = p.matcher(try1);
if(m.find())
{ 
    System.out.println("Address ======> " + m.group());
}
else System.out.println("Address ======>Not found ");

I want only US addresses in output:

[(3909 Witmer Road Niagara Falls NY 14305) and (420, Fanboy Lane, NewYark, AS 12345)]

but it's outputting like this:

(3909 Witmer Road Niagara Falls NY 14305 and our adress is 120, 5th cross, 1st main, domlur, Bangalore 50071 nad 420, Fanboy Lane, NewYark, AS 12345)

You could try a regex a bit more like this:

"(\\b[0-9]{3,5},? [A-Za-z]+(?: [A-Za-z]+,?)* [a-zA-Z]{2} [0-9]{5})"

The [A-Za-z]+,? part allows only letters (and not numbers).

regex101 demo .

The * operator is greedy, so it matches as many characters as it can. In your expression, the [a-zA-Z]{2} [0-9]{5} part that matches the zip code and state matches the very last ZIP and state in the input, because the .* patterns you have earlier in the expression, expand to as many characters as they can.

Try changing the . s to [^0-9] so that it matches anything except digits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM