I've been using the following Regex to extract a zip code from a bunch of text:
"\\d{5}\\-?[1-9]?[1-9]?[1-9]?[1-9]?"
My intention of making the last 4 [1-9] optional (using ? ) was to be able to extract both 5 digit zip codes and 5 digit zip codes with + 4 such as 11001-1010.
However, it only matches the first two digits of the last four numbers even though I put 4 digits at the end.
For example, in the zip code 11001-1010 it would match 11001-10.
Anyone know why?
You can use \\\\d{5}\\\\-\\\\d{0,4}
which allows you to match 0
to 4
digits after -
.
EDIT
From the comment : But then the - won't be optional.
For that you can use \\\\d{5}(\\\\-\\\\d{0,4})?
to make group of -
and digits after dash optional.
It's stopping at the first 0 in the suffix, "\\d{5}\\-?[1-9]?[1-9]?[1-9]?[1-9]?" So in your example, it only matches up to 11001-1 Does "\\d{5}\\-?[0-9]?[0-9]?[0-9]?[0-9]?" work ok? The other answers are probably cleaner, but that is the bug.
Looks ok per this
Simple answer to question: For zip code 11001-1010
your regex would only match 11001-1
because the optional 4 digits after the -
cannot be 0
.
For the unstated question of how to fix that, it depends on whether you only want to match an optional +4, or you want to also match +3, +2, +1, and +0, like your expression would.
Matching Zip5 with optional +4, eg matching 11001-1010
and 11001
:
"\\d{5}(?:-\\d{4})?"
Matching Zip5 with optional +N, eg matching 11001-1010
, 11001-101
, 11001-10
, 11001-1
, 11001-
, and 11001
:
"\\d{5}(?:-\\d{0,4})?"
Update
Now, if you want to make sure it doesn't match the 56789-1234
of 123456789-123456789
or abcd56789-1234qwerty
, you can add a word-boundary check:
"\\b\\d{5}(?:-\\d{4})?\\b"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.