简体   繁体   中英

How would I modify this regex to extract the left and right hand parts of a UK postal code?

I have a regular expression which works for validating UK postal codes but now I would like to extract the constituent parts of the code and I'm getting confused. For those who do not know examples of UK postal codes are 'WC1 1AA', 'WC11 1AA' and 'M1 1AA'.

The regular expression below (apologies for the formatting) handles the lack of a space (this is the \\s{0,} bit) between the left and right parts and still validates (which is great).

(?:(?:A[BL]|B[ABDHLNRST]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[CHNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?|T[ADFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE)\d(?:\d|[A-Z])?\s{0,}\d[A-Z]{2})

I'd like to be able to extract the left and right hand sides now and I know that brackets are used for this, but there are already brackets in there and the regex specification is not easy to read. So I guess these brackets need replacing, can anyone help me rework my brackets?

I can see other people would find this regex of use, so please feel free to use it for validating UK postal addresses.

Actually, parentheses are used for extraction, not brackets. The (?: constructs in your expression are how you prevent parentheses from performing extraction. You would want:

(?:((?:A[BL]|B[ABDHLNRST]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[CHNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?|T[ADFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE)\d(?:\d|[A-Z])?)\s{0,}(\d[A-Z]{2}))

Incidentally, I would also make this change:

(?:((?:A[BL]|B[ABDHLNRST]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[CHNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?|T[ADFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE)\d(?:\d|[A-Z])?)\s*(\d[A-Z]{2}))

because \\s{0,} is a goofy way to write \\s* .

Additionally, I'd recommend against trying to check the postcode so thoroughly. The list of valid postcodes can change, so you'll have to maintain the expression every time the Post Office updates the PAF.

You're also missing some of the “special postcodes” like BFPO, GIR, the non-geographic postcodes and overseas territories. See wiki for an overview of what's out there you might have to deal with.

In general for most purposes a “does it look plausible?” check is better than trying to nail it down completely. There's nothing worse than telling customers they can't use your service because their address doesn't exist.

When dealing with a large regex like this you should use the /x option (which I think is called RegexOptions.IgnorePatternWhitespace in C#). (?:) is not capturing, so all you need to do is put () around the parts you want. Another benefit of the /x option is that you can comment the regex with end-of-line comments (they start with # ). You may also might need to be careful with \\d and \\s. They may match more than you expect ( \\s matches all whitespace, not just spaces and, at least in Perl 5.8 and later, \\d matches all UNICODE digit characters, not just [0-9] )

Regex exp = new Regex(@"
    (?:
        ( #capture first part
            (?:
                A[BL]        | B[ABDHLNRST]? | C[ABFHMORTVW]      |
                D[ADEGHLNTY] | E[CHNX]?      | F[KY]              |
                G[LUY]?      | H[ADGPRSUX]   | I[GMPV]            |
                JE           | K[ATWY]       | L[ADELNSU]?        |
                M[EKL]?      | N[EGNPRW]?    | O[LX]              |
                P[AEHLOR]    | R[GHM]        | S[AEGKLMNOPRSTWY]? |
                T[ADFNQRSW]  | UB            | W[ACDFNRSV]?       |
                YO           | ZE
            )
            \d
            (?:
                \d | [A-Z]
            )?
        ) #end capture of first part
        \s{0,}
        ( #capture second part
            \d[A-Z]{2}
        ) #end capture of second part
    )",
    RegexOptions.IgnorePatternWhitespace
);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM