简体   繁体   English

从地址中提取邮政编码

[英]extracting postal code from addresses

I am looking for a solution in c# to extract postal code info from address. 我正在寻找一种在C#中从地址中提取邮政编码信息的解决方案。

The postal codes of following countries 以下国家的邮政编码

Canada,US,Germany,UK,Turkey,France,Pakistan,India,Italy.

The address can be something like these 地址可以是这样的

188 pleasant street, new minas, Nova Scotia b2p 6r6, Canada.

or 109 A, block 3, DHA, Karachi 75600, Pakistan. 或巴基斯坦卡拉奇75600 DHA 3座109A。

what I want: I want to extract any alphanumerics that is adjacent to city or country name. 我想要的是:我想提取与城市或国家/地区名称相邻的任何字母数字。 But having difficulty creating regular expression for it 但是很难为其创建正则表达式

It's quite an open-ended task. 这是一个开放的任务。 You have to follow some specific format in there. 您必须在其中遵循某些特定格式。 Because what will happen if there'll be two numeric strings in the address (like a case where street is a number). 因为如果地址中有两个数字字符串(例如,街道是数字的情况),将会发生什么。 So two options are possible: 因此有两种选择:

  • Address is always in a specific format and you know the actual format 地址始终采用特定格式,您知道实际格式
  • The zip is always of a given length 拉链始终是给定的长度

In both case regular expressions will lead you to the solution. 在这两种情况下,正则表达式都会带您找到解决方案。 - For the first example, assuming the zip code is in the given order (let's say '6r6' in your original example), you can use the following regular expression pattern: "(\\S+)\\, ?\\w+$" - For the second case, assuming the zip code is a number of 5+ digits, which comes after the first ',', then the following pattern can be used to extract it: "(,.*)+(\\d{5})". -对于第一个示例,假设邮政编码按给定顺序(在原始示例中为“ 6r6”),则可以使用以下正则表达式模式:“(\\ S +)\\,?\\ w + $”-对于在第二种情况下,假设邮政编码是5个以上的数字,且在第一个','之后,则可以使用以下模式来提取它:“(,。*)+(\\ d {5}) ”。 The second group will be the zip code in the match. 第二组将是比赛中的邮政编码。

Here is the code you can use: public static string GetSingleMatch(string address, string pattern, int group = 0) { return new Regex(pattern, RegexOptions.IgnoreCase).Match(address).Groups[group].Value; 这是您可以使用的代码:public static string GetSingleMatch(字符串地址,字符串模式,整数组= 0){返回新的Regex(pattern,RegexOptions.IgnoreCase).Match(address).Groups [group] .Value; } }

The "group" optional parameter indicates the regex group which will contain the zip code. “ group”可选参数表示将包含邮政编码的正则表达式组。

I believe it's reasonable that you assume general rule in address which the country is the last and city or state before it, so post code can be placed between city or state and country and as you stated in the example ',' is used as separator, so it can be as following : 我认为您应该合理地假设哪个国家是最后一个国家或它之前的城市或州,因此可以在城市或州与国家之间放置邮政编码,并且如示例中所述,用“,”作为分隔符,因此可以如下所示:

    private string GetPostCode(string address )
    {
        string result = string.Empty;

        string[] list = address.Split(',');
        list.Reverse();
        foreach (var item in list)
        {
            // if item contains numeric postcode 
            Regex re = new Regex(@"\d+");
            Match m = re.Match(item);
            result = m.Value;
            if (!string.IsNullOrEmpty(result))
                break;
        }

        return result;
    }

I hope it would be helpful. 希望对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM