简体   繁体   中英

How can I make this regex match correctly?

Given this regex:

^((https?|ftp):(\/{2}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}
|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1})

Reformatted for readability:

@"^((https?|ftp):(\/{2}))?" + // http://, https://, ftp:// - Protocol Optional
@"(" + // Begin URL payload format section
@"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" + // IPv4 Address support
@")|("+ // Delimit supported payload types
@"((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1}" + // FQDNs
@")"; // End URL payload format section

How can I make it fail (ie not match) on this "fail" test case?

http://www.google

As I am specifying {1} on the TLD section, I would think it would fail without the extension. Am I wrong?

Edit: These are my PASS conditions:

These are my FAIL conditions:

I'll throw out an alternative suggestion. You may want to use a combination of the parsing of the built-in System.Uri class and a couple targeted regexes (or simple string checks when appropriate).

Example:

string uriString = "...";

Uri uri;
if (!Uri.TryCreate(uriString, UriKind.Absolute, out uri))
{
    // Uri is totally invalid!
}
else
{
    // validate the scheme
    if (!uri.Scheme.Equals("http", StringComparison.OrdinalIgnoreCase))
    {
        // not http!
    }

    // validate the authority ('www.blah.com:1234' portion)
    if (uri.Authority // ...)
    {
    }

    // ...
}

Sometimes, one catch-all reqex is not the best solution, however tempting. While debugging this regex is feasible (see Greg Hewgills answer), consider doing a couple of tests for different categories of problems, eg one test for numerical addresses and one test for named addresses.

You need to force your regex to match up until the end of the string. Add a $ at the very end of it. Otherwise, your regex is probably just matching http:// , or something else shorter than your whole string.

The "validate a url" problem has been solved* numerous times. I suggest you use the System.Uri class, it validates more cases than you can shake a stick at.

The code Uri uri = new Uri(" http://whatever "); throws a UriFormatException if it fails validation. That is probably what you'd want.

*) Or kind of solved. It's actually pretty tricky to define what is a valid url.

Its all about definitions, a "valid url" should provide you with a IP address when you do a DNS Lookup. The IP should be connected to and when a request is send out, you get a reply in the form of a HTML information that you can use.

So what we are looking for is a "valid URL Format" and that is where the system.uri comes in very handy. BUT, if the URL is hidden in a large piece of tekst, you would first like to find something that validates as a valid URL-Format.

The thing that distinquishes a URL from any given readable tekst is the dot not followed by whitespace. "123.com" could validate as a real URL.

Using the regex

[a-z_\.\-0-9]+\.[a-z]+[^ ]*

to find any possible valid url in a text and then do a system.uri check to see if its a valid URL format and then do a lookup. Only when the lookup gives you a result then you know the URL is valid.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM