简体   繁体   中英

Regular expression to parse FTP link string

I have the following code to parse the parts of an FTP link:

Regex exp = new Regex(@"(?i)ftp:\/\/(?<user>\S+?):(?<passwd>\S+?)@(?<host>\S+?.\S+?.\S+?.\S+?)");
Match m = exp.Match(@"Link: ftp://username:password@host.sub.domain.tld<ftp://username:password@host.sub.domain.tld/>");

Console.WriteLine("Host = " + m.Groups["host"].Value);
Console.WriteLine("User = " + m.Groups["user"].Value);
Console.WriteLine("Pass = " + m.Groups["passwd"].Value);

Which produces the following output:

Host = host.su
User = username
Pass = password

Why is the host being truncated?

Parsing Uri's is already done in .NET. The syntax of URIs has too many edge cases and variations to just use a regex.

So use the inbuilt support:

var u = new Uri("ftp://username:password@host.sub.domain.tld");

var host = u.Host;
var ui = u.UserInfo.Split(':')
var user = ui[0];
var pwd = ui[1];

Because \\S will match also the dot character and . would match any character.

@"(?i)ftp:\/\/(?<user>\S+?):(?<passwd>\S+?)@(?<host>[^.\s]+\.[^.\s]+\.[^.\s]+\.\w+)"

DEMO

Why?

(?<host>\S+?.\S+?.\S+?.\S+?)
  • \\S+? - Matches the first charcter because of non-greediness.
  • . - Matches the second character, since an unescaped dot would match any character.
  • Likewise it matches only first 7 chars in the host part.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM