简体   繁体   English

解析字符串中的多个主机名

[英]Parse multiple hostnames from string

I am trying to parse multiple hostnames from a string using a Regex in C#. 我正在尝试使用C#中的Regex从字符串解析多个主机名。

Example string: abc.google.com another example here abc.microsoft.com and another example abc.bbc.co.uk 示例字符串: abc.google.com another example here abc.microsoft.com and another example abc.bbc.co.uk

The code I have been trying is below: 我一直在尝试的代码如下:

string input = "abc.google.com another example here abc.microsoft.com and another example abc.bbc.co.uk";
string FQDN_Pat = @"^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$";

Regex r = new Regex(FQDN_Pat);
Match m = r.Match(input);         
while (m.Success)
{
    txtBoxOut.Text += "Match: " + m.Value + " ";
    m = m.NextMatch();
}

The code works if the string fits the pattern exactly eg abc.google.com . 如果字符串完全适合该模式(例如abc.google.com则代码有效。

How can I change the Regex to match the patterns that fit within the example string eg so the output would be: 如何更改正则表达式以匹配示例字符串中适合的模式,例如,因此输出为:

Match: abc.google.com 匹配:abc.google.com
Match: abc.microsoft.com 匹配:abc.microsoft.com
Match: abc.bbc.co.uk 匹配:abc.bbc.co.uk

Apologies in advance if this is something very simple as my knowledge of regular expressions is not great! 如果这很简单,请提前道歉,因为我对正则表达式的知识不是很好! :) Thanks! :) 谢谢!

UPDATE: 更新:

Updating the Regex to the following (removing the ^ and $ ): 将正则表达式更新为以下内容(删除^$ ):

string FQDN_Pat = @"([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?)(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA‌​-Z0-9\-]{0,61}[a-zA-Z0-9]))"; 

Results in the following output: 结果如下:

Match 1: abc.g 比赛1:abc.g
Match 2: oogle.c 比赛2:oogle.c
Match 3: abc.m 比赛3:abc.m
Match 4: icrosoft.c 比赛4:icrosoft.c
Match 5: abc.b 比赛5:abc.b
Match 6: bc.c 比赛6:bc.c
Match 7: ou 比赛7:ou

As the regexp is quite complicated I tried to simplify it a bit. 由于regexp非常复杂,因此我尝试对其进行简化。 So what I've done was to 所以我要做的是

  1. Remove ^ and $ to make the regexp match anywhere 删除^$以使正则表达式在任何地方都匹配
  2. Simplify characters that you match to , so instead of ([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9]) i'm using ([a-zA-Z0-9])+ which means look for any alphanumeric sequence with length higher than one (the + sign means that you match to a char that appears once or more). 简化与您匹配的字符,因此代替([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])我正在使用([a-zA-Z0-9])+ ,表示查找长度大于一的任何字母数字序列(“ +表示您匹配出现一次或多次的char )。 Let's call it X . 我们称它为X If the rules for names in FQDN are more complex please modify this value 如果FQDN中的名称规则更复杂,请修改此值
  3. Expression for finding FQDN is X(\\.X)+ . 用于查找FQDN的表达式为X(\\.X)+ This can be viewed as sequence of chars followed by one or more sequences, all are separated by dots ( . ). 可以将其视为一个字符序列,后跟一个或多个序列,所有字符均由点( . )分隔。 Substitiuting X you have full expression given as 代入X您得到的完整表达式为

     string FQDN_Pat = @"([a-zA-Z0-9]+)(\\.([a-zA-Z0-9])+)+"; 

which actually matches to your example but I suggest you read C# regexp manuals for further references in case there are some tricks in domain names 这实际上与您的示例匹配,但是如果域名中存在一些技巧,我建议您阅读C#regexp手册以获取更多参考。

You get this behavior because you are only matching the string that contain nothing else but your pattern. 之所以会出现这种现象,是因为您仅匹配的字符串中除了模式以外还不包含其他任何内容。 You are using ^ (start of the string) and $ (end of the string). 您正在使用^ (字符串的开头)和$ (字符串的结尾)。 If you want to match your pattern anywhere in the input string remove those characters from the pattern. 如果要在输入字符串中的任何位置匹配模式,请从模式中删除这些字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM