[英]Parse multiple hostnames from string
I am trying to parse multiple hostnames from a string using a Regex in C#. 我正在尝试使用C#中的Regex从字符串解析多个主机名。
Example string: abc.google.com another example here abc.microsoft.com and another example abc.bbc.co.uk
示例字符串:
abc.google.com another example here abc.microsoft.com and another example abc.bbc.co.uk
The code I have been trying is below: 我一直在尝试的代码如下:
string input = "abc.google.com another example here abc.microsoft.com and another example abc.bbc.co.uk";
string FQDN_Pat = @"^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$";
Regex r = new Regex(FQDN_Pat);
Match m = r.Match(input);
while (m.Success)
{
txtBoxOut.Text += "Match: " + m.Value + " ";
m = m.NextMatch();
}
The code works if the string fits the pattern exactly eg abc.google.com
. 如果字符串完全适合该模式(例如
abc.google.com
则代码有效。
How can I change the Regex to match the patterns that fit within the example string eg so the output would be: 如何更改正则表达式以匹配示例字符串中适合的模式,例如,因此输出为:
Match: abc.google.com 匹配:abc.google.com
Match: abc.microsoft.com 匹配:abc.microsoft.com
Match: abc.bbc.co.uk 匹配:abc.bbc.co.uk
Apologies in advance if this is something very simple as my knowledge of regular expressions is not great! 如果这很简单,请提前道歉,因为我对正则表达式的知识不是很好! :) Thanks!
:) 谢谢!
UPDATE: 更新:
Updating the Regex to the following (removing the ^
and $
): 将正则表达式更新为以下内容(删除
^
和$
):
string FQDN_Pat = @"([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?)(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))";
Results in the following output: 结果如下:
Match 1: abc.g 比赛1:abc.g
Match 2: oogle.c 比赛2:oogle.c
Match 3: abc.m 比赛3:abc.m
Match 4: icrosoft.c 比赛4:icrosoft.c
Match 5: abc.b 比赛5:abc.b
Match 6: bc.c 比赛6:bc.c
Match 7: ou 比赛7:ou
As the regexp is quite complicated I tried to simplify it a bit. 由于regexp非常复杂,因此我尝试对其进行简化。 So what I've done was to
所以我要做的是
^
and $
to make the regexp match anywhere ^
和$
以使正则表达式在任何地方都匹配 ([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])
i'm using ([a-zA-Z0-9])+
which means look for any alphanumeric sequence with length higher than one (the +
sign means that you match to a char that appears once or more). ([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])
我正在使用([a-zA-Z0-9])+
,表示查找长度大于一的任何字母数字序列(“ +
表示您匹配出现一次或多次的char )。 Let's call it X
. X
If the rules for names in FQDN are more complex please modify this value Expression for finding FQDN is X(\\.X)+
. 用于查找FQDN的表达式为
X(\\.X)+
。 This can be viewed as sequence of chars followed by one or more sequences, all are separated by dots ( .
). 可以将其视为一个字符序列,后跟一个或多个序列,所有字符均由点(
.
)分隔。 Substitiuting X
you have full expression given as 代入
X
您得到的完整表达式为
string FQDN_Pat = @"([a-zA-Z0-9]+)(\\.([a-zA-Z0-9])+)+";
which actually matches to your example but I suggest you read C# regexp manuals for further references in case there are some tricks in domain names 这实际上与您的示例匹配,但是如果域名中存在一些技巧,我建议您阅读C#regexp手册以获取更多参考。
You get this behavior because you are only matching the string that contain nothing else but your pattern. 之所以会出现这种现象,是因为您仅匹配的字符串中除了模式以外还不包含其他任何内容。 You are using
^
(start of the string) and $
(end of the string). 您正在使用
^
(字符串的开头)和$
(字符串的结尾)。 If you want to match your pattern anywhere in the input string remove those characters from the pattern. 如果要在输入字符串中的任何位置匹配模式,请从模式中删除这些字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.