简体   繁体   中英

Extract North American postal code using regex

I have the following regex that I use to validate North American postal codes:

(?:(\d{5})(?:-\d{4})?)|(?:([a-zA-Z]\d[a-zA-Z]) ?(\d[a-zA-Z]\d))

FYI, I understand that it could be more exact, in regards to verifying certain characters in certain positions.

What I'd like to do is use this same validation constant to also extract the postal code in the format:

00000
  or
a0a0a0

The regex above comes close; if I concatenate all of the capturing groups (except for the root), I get the result I seek. For example, a US code will capture in group 1, or a Canadian code will capture in 2 + 3.

Is there a better way to do this? Or maybe that is assumed it would be a feature of a regex library, to concatenate all subgroups. (Incidentally, this is C# .NET)

I'd make your 2nd( ([a-zA-Z]\\d[a-zA-Z]) ) and 3rd( (\\d[a-zA-Z]\\d) ) groups passive, but make Canadian code group ( (?:([a-zA-Z]\\d[a-zA-Z]) ?(\\d[a-zA-Z]\\d)) which is passive now) non-passive. In this case there will be only two non-passive groups: one for US code, and one for Canadian. Also I'd add word boundaries for each group:

var regex = new Regex(@"\b(?:(\d{5})(?:-\d{4})?)\b|\b((?:[A-Z]\d[A-Z]) ?(?:\d[A-Z]\d))\b", RegexOptions.IgnoreCase);
var input = @"00000 or a0a 0a0 and not 11111a or b1b1b11";
var postalCodes = regex.Matches(input)
    .Cast<Match>()
    .Select(m => m.Value)
    .ToArray();

This will match 00000 and a0a 0a0 , but will skip incorrect 11111a and b1b1b11 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM