简体   繁体   English

如何使此正则表达式正确匹配?

[英]How can I make this regex match correctly?

Given this regex: 鉴于此正则表达式:

^((https?|ftp):(\/{2}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}
|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1})

Reformatted for readability: 重新格式化以提高可读性:

@"^((https?|ftp):(\/{2}))?" + // http://, https://, ftp:// - Protocol Optional
@"(" + // Begin URL payload format section
@"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" + // IPv4 Address support
@")|("+ // Delimit supported payload types
@"((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1}" + // FQDNs
@")"; // End URL payload format section

How can I make it fail (ie not match) on this "fail" test case? 如何在此“失败”测试用例上使其失败(即不匹配)?

http://www.google

As I am specifying {1} on the TLD section, I would think it would fail without the extension. 当我在TLD部分中指定{1}时,我认为如果没有扩展名它将失败。 Am I wrong? 我错了吗?

Edit: These are my PASS conditions: 编辑:这些是我的通过条件:

These are my FAIL conditions: 这些是我的失败条件:

I'll throw out an alternative suggestion. 我会提出另一个建议。 You may want to use a combination of the parsing of the built-in System.Uri class and a couple targeted regexes (or simple string checks when appropriate). 您可能需要结合使用内置System.Uri类的解析和几个目标正则表达式(或在适当时进行简单的字符串检查)。

Example: 例:

string uriString = "...";

Uri uri;
if (!Uri.TryCreate(uriString, UriKind.Absolute, out uri))
{
    // Uri is totally invalid!
}
else
{
    // validate the scheme
    if (!uri.Scheme.Equals("http", StringComparison.OrdinalIgnoreCase))
    {
        // not http!
    }

    // validate the authority ('www.blah.com:1234' portion)
    if (uri.Authority // ...)
    {
    }

    // ...
}

Sometimes, one catch-all reqex is not the best solution, however tempting. 有时,一个万能的reqex不是最佳解决方案,但是很诱人。 While debugging this regex is feasible (see Greg Hewgills answer), consider doing a couple of tests for different categories of problems, eg one test for numerical addresses and one test for named addresses. 尽管调试此正则表达式是可行的(请参阅Greg Hewgills的答案),但请考虑针对不同类别的问题进行一些测试,例如,针对数字地址的一项测试和针对命名地址的一项测试。

You need to force your regex to match up until the end of the string. 您需要强制正则表达式匹配直到字符串的末尾。 Add a $ at the very end of it. 在其末尾添加一个$ Otherwise, your regex is probably just matching http:// , or something else shorter than your whole string. 否则,您的正则表达式可能只匹配http:// ,或者比整个字符串短的其他内容。

The "validate a url" problem has been solved* numerous times. “验证网址”问题已被解决*很多次。 I suggest you use the System.Uri class, it validates more cases than you can shake a stick at. 我建议您使用System.Uri类,它可以验证更多的案例,而不用您动摇。

The code Uri uri = new Uri(" http://whatever "); 代码Uri uri = new Uri(" http://whatever "); throws a UriFormatException if it fails validation. 如果验证失败,则抛出UriFormatException That is probably what you'd want. 那可能就是您想要的。

*) Or kind of solved. *)或某种解决。 It's actually pretty tricky to define what is a valid url. 定义什么是有效的URL实际上非常棘手。

Its all about definitions, a "valid url" should provide you with a IP address when you do a DNS Lookup. 有关定义的所有信息,“有效网址”应在您执行DNS查找时为您提供IP地址。 The IP should be connected to and when a request is send out, you get a reply in the form of a HTML information that you can use. 该IP应该已连接,并且在发送请求时,您会收到可以使用的HTML信息形式的答复。

So what we are looking for is a "valid URL Format" and that is where the system.uri comes in very handy. 因此,我们正在寻找一种“有效的URL格式”,这就是system.uri派上用场的地方。 BUT, if the URL is hidden in a large piece of tekst, you would first like to find something that validates as a valid URL-Format. 但是,如果URL隐藏在很大的tekst中,则您首先要查找可以验证为有效URL格式的内容。

The thing that distinquishes a URL from any given readable tekst is the dot not followed by whitespace. 区别于任何给定的可读tekst的URL的原因是点号后面没有空格。 "123.com" could validate as a real URL. “ 123.com”可以验证为真实网址。

Using the regex 使用正则表达式

[a-z_\.\-0-9]+\.[a-z]+[^ ]*

to find any possible valid url in a text and then do a system.uri check to see if its a valid URL format and then do a lookup. 查找文本中任何可能的有效url,然后执行system.uri检查以查看其是否为有效的URL格式,然后进行查找。 Only when the lookup gives you a result then you know the URL is valid. 仅当查找为您提供结果时,您才知道URL有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM