简体   繁体   English

在 oracle 中使用正则表达式验证英国邮政编码

[英]Validate UK postcode using regular expression in oracle

Below is the list of valid postcodes:以下是有效邮政编码列表:

A1 1AA
A11 1AA
AA1 1AA
AA11 1AA
A1A 1AA
BFPO 1
BFPO 11
BFPO 111

I tried with (([AZ]{1,2}[0-9]{1,2})\\ ([0-9][AZ]{2}))|(GIR\\ 0AA)$ but it is not working.我试过(([AZ]{1,2}[0-9]{1,2})\\ ([0-9][AZ]{2}))|(GIR\\ 0AA)$但它不是在职的。 Could you please help me with proper query to validate all the postcode formats.您能否帮助我进行正确的查询以验证所有邮政编码格式。

First, rather than guessing based on the set of data at hand, let's look at what UK postcodes are .首先,与其根据手头的数据集进行猜测,不如让我们看看英国的邮政编码是什么

EC1V 9HQ EC1V 9HQ

The first one or two letters is the postcode area and it identifies the main Royal Mail sorting office which will process the mail.前一个或两个字母是邮政编码区域,它标识将处理邮件的主要皇家邮政分拣办公室。 In this case EC would go to the Mount Pleasant sorting office in London.在这种情况下,EC 将前往伦敦的 Mount Pleasant 分拣办公室。

The second part is usually just one or two numbers but for some parts of London it can be a number and a letter.第二部分通常只是一个或两个数字,但对于伦敦的某些地区,它可以是一个数字和一个字母。 This is the postcode district and tells the sorting office which delivery office the mail should go to.这是邮政编码区,它告诉分拣办公室邮件应该去哪个投递办公室。

This third part is the sector and is usually just one number.这第三部分是扇区,通常只是一个数字。 This tells the delivery office which local area or neighbourhood the mail should go to.这会告诉投递办公室邮件应该去往哪个地区或社区。

The final part of the postcode is the unit code which is always two letters.邮政编码的最后一部分是单位代码,它总是两个字母。 This identifies a group of up to 80 addresses and tells the delivery office which postal route (or walk) will deliver the item.这可以识别最多 80 个地址的组,并告诉投递办公室将投递物品的邮政路线(或步行路线)。

Digesting that...消化那个...

  1. 1 or 2 letters. 1 或 2 个字母。
  2. A number and maybe an alphanumeric.一个数字,也许是一个字母数字。
  3. A space.空间。
  4. "Usually" a number, but I can't find any instances otherwise. “通常”是一个数字,但除此之外我找不到任何实例。
  5. 2 letters. 2个字母。
\A[[:alpha:]]{1,2}\d[[:alnum:]]? \d[[:alpha:]]{2}\z

We can't use \\w because that contains an underscore.我们不能使用\\w因为它包含一个下划线。

I used the more exact \\A and \\z over ^ and $ because \\A and \\z match the exact beginning and end of the string, whereas ^ and $ match the beginning and end of a line.我在^$使用了更精确的\\A\\z ,因为\\A\\z匹配字符串的确切开头和结尾,而^$匹配行的开头和结尾。 $ in particular is tolerant of a trailing newline. $特别是容忍尾随换行符。


Of course, there are special cases.当然,也有特殊情况。 XXXX 1ZZ for various overseas territories, XXXX is enumerated. XXXX 1ZZ 为各个海外领地,XXXX 一一列举。

\A(ASCN|STHL|TDCU|BBND|BIQQ|FIQQ|PCRN|SIQQ|TKCA) 1ZZ\z

Then a couple of really special cases.然后是一些非常特殊的情况。

  • GIR 0AA for Girobank . GIR 0AA 为Girobank
  • AI-2640 for Anguilla.用于安圭拉的 AI-2640。
\A(AI-2640|GIR 0AA)\z

Put them all together into one big (...|...|...) mess.把它们放在一起变成一个大(...|...|...)乱七八糟的东西。 It's good to build the query in three pieces and put it together with the x modifier to ignore whitespace.最好将查询分成三部分构建,并将其与x修饰符放在一起以忽略空格。

REGEXP_LIKE(
    postcode,
    '\A
     (
      [[:alpha:]]{1,2}\d[[:alnum:]]?\ \d[[:alpha:]]{2}\z   |
      (ASCN|STHL|TDCU|BBND|BIQQ|FIQQ|PCRN|SIQQ|TKCA)\ 1ZZ  |
      (AI-2640|GIR\ 0AA)
     )
     \z',
    'x'
)

Or you can make the basic regex less strict and accept 2-4 alphanumerics for the first part.或者您可以让基本的正则表达式不那么严格,并在第一部分接受 2-4 个字母数字。 Then there's only the special case for Anguilla to worry about.那么只有安圭拉需要担心的特殊情况。

\A([[:alnum:]]{2,4} \d[[:alpha:]]{2}|AI-2640)\z

On the downside, this will let in post codes that don't exist.不利的一面是,这会让不存在的邮政编码进入。 On the up side, you don't have to keep tweaking for additional special cases.从好的方面来说,您不必为其他特殊情况不断调整。 That's probably fine for this level of filtering.对于这种级别的过滤,这可能没问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM