简体   繁体   English

C#中的正则表达式问题

[英]Regex Issue in C#

I am trying to create a C# routine that removes all of the following prefixes and suffixes and returns just the root word of a domain: 我正在尝试创建一个C#例程,该例程删除以下所有前缀和后缀,并仅返回域的根字:

var stripChars = new List<string> { "http://", "https://", "www.", "ftp.", ".com",  ".net", ".org", ".info", ".co", ".me", ".mobi", ".us", ".biz" };

I do this with the following code: 我使用以下代码执行此操作:

originalDomain = stripChars.Aggregate(originalDomain, (current, repl) => Regex.Replace(current, repl, @"", RegexOptions.IgnoreCase));

Which seems to work in almost all cases. 这似乎在几乎所有情况下都有效。 Today, however, I discovered that setting "originalDomain" to "NameCheap.com" does not return: 但是今天,我发现将“ originalDomain”设置为“ NameCheap.com”不会返回:

NameCheap

Like it should, but rather: 像它应该的那样,而是:

NCheap

Can anyone look at this and tell me what is going wrong? 谁能看看这个,告诉我怎么了? Any help would be appreciated. 任何帮助,将不胜感激。

THis is normal: the dot in a regex means any character. 这是正常现象:正则表达式中的点表示任何字符。

Therefore, .me matches ame in NameCheap . 因此, .me匹配ameNameCheap

Escape the dots with a backslash. 用反斜杠转义点。

Also, you'd be better off using a dedicated URI API for this kind of operation. 另外,最好使用专用的URI API进行此类操作。

I know this doesn't answer your question directly, but given the specific task you are trying to accomplish I would recommend trying something like this: 我知道这并不能直接回答您的问题,但是鉴于您要完成的特定任务,我建议您尝试如下操作:

Uri uri = new Uri(originalDomain);
originalDomain = uri.Host;

EDIT: 编辑:

If your input may not contain a scheme you can use the uri builder as notied in this post 如果你的输入可能不包含一个方案,你可以使用URI建设者在notied 这个帖子

var hostName = new UriBuilder(input).Host

Hope this helps. 希望这可以帮助。

Try this instead: 尝试以下方法:

var stripChars = new List<string> {"http://", "https://", "www[.]", "ftp[.]", "[.]com", "[.]net", "[.]org", "[.]info", "[.]co", "[.]me", "[.]mobi", "[.]us", "[.]biz"};

The '.' “。” character in Regular Expression is special, it stands for any character. 正则表达式中的字符是特殊字符,代表任何字符。 This is one way to escape it. 这是逃避它的一种方法。

However, as other have mentioned, your current solution for handling URL is brittle and you should explore other solution. 但是,正如其他人提到的那样,您当前用于处理URL的解决方案很脆弱,您应该探索其他解决方案。 Ideally you want to use something that really understands how to parse the URL syntax. 理想情况下,您想使用真正理解如何解析URL语法的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM