[英]How to Extract Domain name from string with Regex in C#?
我想用正則表達式從字符串中提取頂級域名和國家頂級域名。 我測試了很多這樣的正則表達式代碼:
var linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Match m = linkParser.Match(Url);
Console.WriteLine(m.Value);
但是這些代碼都不能正確執行。 用戶輸入的文本字符串可以在以下語句中:
jonasjohn.com
http://www.jonasjohn.de/snippets/csharp/
jonasjohn.de
www.jonasjohn.de/snippets/csharp/
http://www.answers.com/article/1194427/8-habits-of-extraordinarily-likeable-people
http://www.apple.com
https://www.cnn.com.au
http://www.downloads.news.com.au
https://ftp.android.co.nz
http://global.news.ca
https://www.apple.com/
https://ftp.android.co.nz/
http://global.news.ca/
https://www.apple.com/
https://johnsmith.eu
ftp://johnsmith.eu
johnsmith.gov.ae
johnsmith.eu
www.jonasjohn.de
www.jonasjohn.ac.ir/snippets/csharp
http://www.jonasjohn.de/
ftp://www.jonasjohn.de/
https://subdomain.abc.def.jonasjohn.de/test.htm
我測試的正則表達式:
^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?([^:\/\n]+)"
\b(?:https?://|www\.)\S+\b
://(?<host>([a-z\\d][-a-z\\d]*[a-z\\d]\\.)*[a-z][-a-z\\d]+[a-z])
還有太多我只需要域名,不需要協議或子域。 如:Domainname.gTLD 或 DomainName.ccTLD 或 DomainName.xyz.ccTLD
我從PUBLIC SUFFIX得到了它們的列表
當然,我在 stackoverflow.com 上看過很多帖子,但沒有一個回答我。
您不需要正則表達式來解析 URL。 如果您有一個有效的 URL,您可以使用 Uri 構造函數之一或Uri.TryCreate來解析它:
if(Uri.TryCreate("http://google.com/asdfs",UriKind.RelativeOrAbsolute,out var uri))
{
Console.WriteLine(uri.Host);
}
www.jonasjohn.de/snippets/csharp/
和jonasjohn.de/snippets/csharp/
不是有效的 URL。 TryCreate
仍然可以將它們解析為相對 URL,但讀取Host
會拋出System.InvalidOperationException: This operation is not supported for a relative URI.
在這種情況下,您可以使用UriBuilder類來解析和修改 URL,例如:
var bld=new UriBuilder("jonasjohn.com");
Console.WriteLine(bld.Host);
這打印
jonasjohn.com
設置Scheme
屬性會生成一個有效、完整的 URL:
bld.Scheme="https";
Console.WriteLine(bld.Uri);
這產生:
https://jonasjohn.com:80/
根據Lidqy的回答,我寫了這個函數,我認為它支持大多數可能的情況,如果輸入值不在這個,你可以讓它例外。
public static string ExtractDomainName(string Url)
{
var regex = new Regex(@"^((https?|ftp)://)?(www\.)?(?<domain>[^/]+)(/|$)");
Match match = regex.Match(Url);
if (match.Success)
{
string domain = match.Groups["domain"].Value;
int freq = domain.Where(x => (x == '.')).Count();
while (freq > 2)
{
if (freq > 2)
{
var domainSplited = domain.Split('.', 2);
domain = domainSplited[1];
freq = domain.Where(x => (x == '.')).Count();
}
}
return domain;
}
else
{
return String.Empty;
}
}
var rx = new Regex(@"^((https?|ftp)://)?(www\.)?(?<domain>[^/]+)(/|$)");
var data = new[] { "jonasjohn.com",
"http://www.jonasjohn.de/snippets/csharp/",
"jonasjohn.de",
"www.jonasjohn.de/snippets/csharp/",
"http://www.answers.com/article/1194427/8-habits-of-extraordinarily-likeable-people",
"http://www.apple.com",
"https://www.cnn.com.au",
"http://www.downloads.news.com.au",
"https://ftp.android.co.nz",
"http://global.news.ca",
"https://www.apple.com/",
"https://ftp.android.co.nz/",
"http://global.news.ca/",
"https://www.apple.com/",
"https://johnsmith.eu",
"ftp://johnsmith.eu",
"johnsmith.gov.ae",
"johnsmith.eu",
"www.jonasjohn.de",
"www.jonasjohn.ac.ir/snippets/csharp",
"http://www.jonasjohn.de/",
"ftp://www.jonasjohn.de/",
"https://subdomain.abc.def.jonasjohn.de/test.htm"
};
foreach (var dat in data) {
var match = rx.Match(dat);
if (match.Success)
Console.WriteLine("{0} => {1}", dat, match.Groups["domain"].Value);
else {
Console.WriteLine("{0} => NO MATCH", dat);
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.