简体   繁体   中英

Regular Expression to parse string url links

I am looking for a way to parse url link into following segments without using System.Uri

/Default.aspx/123/test?var1=val1

I need to break down this url link into values:

  1. File
  2. PathInfo
  3. Querystring

Here's one:

string pattern = @"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)"

Origin Link

string pattern= "\b(?<protocol>https?|ftp|gopher|telnet|file|notes|ms-help)://(?<domain>[-A-Z0-9.]+)(?<file>/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(?<parameters>\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?"

This will generate named groups check for for what you want to extract

Here is my code:

   var match = Regex.Match(internalUrl,
                            @"^\/([\w|\/|\-|\,|\s]+)\.([a-zA-Z]{2,5})([\w|\/|\-|\,|\s]*)\??(.*)",
                            RegexOptions.IgnoreCase | RegexOptions.Singleline |
                            RegexOptions.CultureInvariant | RegexOptions.Compiled);
    if (match.Success)
    {
        var filePath = match.Groups[1].Value;
        var fileExtention = match.Groups[2].Value;
        var pathInfo = match.Groups[3].Value;
        var queryString = match.Groups[4].Value;

        log.Debug("FilePath: " + filePath);
        log.Debug("FileExtention: " + fileExtention);
        log.Debug("PathInfo: " + pathInfo);
        log.Debug("QueryString: " + queryString);
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM