简体   繁体   English

正则表达式在文本中查找 Windows 个文件路径

[英]RegEx to find Windows file paths inside of text

I have been bashing my head against this one for a few hours and I just can't seem to crack it.几个小时以来,我一直在用我的脑袋来解决这个问题,但我似乎无法破解它。

I have been tasked with writing an application that loops through numerous config files to identify any valid windows file or folder paths within the text.我的任务是编写一个循环遍历大量配置文件的应用程序,以识别文本中任何有效的 windows 文件或文件夹路径。

eg:例如:

\\\10.0.0.1\folder\
\\\10.0.0.1\folder\filename.txt

\\\servername\folder\
\\\servername\folder\filename.txt

d:\folder\
d:\folder\filename.txt

I am using C# and here is the closest working version I've got so far我正在使用 C#,这是目前为止最接近的工作版本

string ex = @"(?!.*[\\\/]\s+)(?!(?:.*\s|.*\.|\W+)$)(?:[a-zA-Z]:)?(?:(?:[^<>:\|\?\*\n])+(?:\/\/|\/|\\\\|\\)?)+$";
var rx = new Regex(ex, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.Compiled);
var matches = rx.Matches(output);
                        
foreach(Match m in matches)
{

You can see work in progress here你可以在这里看到正在进行的工作

It does exactly what I need with the " d:\ " paths but those start with " \\\\ " it only kind of works if that path is at the start of the string???它完全符合我对“ d:\ ”路径的需求,但那些以“ \\\\ ”开头的路径只有在该路径位于字符串开头时才有效???

Ideally if I could just get the Folder paths returned excluding the file that would be an added bonus.理想情况下,如果我能得到不包括文件的文件夹路径,那将是一个额外的好处。

Any help appreciated.任何帮助表示赞赏。

You may use this regex to capture folder and filename in 2 separate capture groups:您可以使用此正则表达式在 2 个单独的捕获组中捕获文件夹和文件名:

(?:\\\\[^\\]+|[a-zA-Z]:)((?:\\[^\\]+)+\\)?([^<>:]*)

RegEx Demo正则表达式演示

RegEx Details:正则表达式详细信息:

  • (?:\\\\[^\\]+|[a-zA-Z]:) : Match either a server name or IP address that starts with \\ followed by 1+ non- \ characters OR a drive letter followed by a : in a non-capturing group (?:\\\\[^\\]+|[a-zA-Z]:) :匹配服务器名称或 IP 地址以\\开头,后跟 1+ 个非\字符或后跟驱动器号by a :在非捕获组中
  • ((?:\\[^\\]+)+\\)? : 1st capture group for folder path that matches a string starting with a \ and matches 1+ non- \ characters allowing multiple occurrences of that followed by a \ . : 文件夹路径的第一个捕获组匹配以\开头的字符串并匹配 1+ 个非\字符,允许多次出现后跟\的字符。 This group is optional due to presence of ?由于存在? in the end.到底。
  • ([^<>:]*) : Match filename that 0 or more of any character that is not < , > and : ([^<>:]*) :匹配 0 个或多个不是<>和 的任何字符的文件名:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM