简体   繁体   English

一种使用RegEx在字符串中查找一组文件名路径的方法

[英]A way to use RegEx to find a set of filenames paths in a string

Good morning guys 早上好家伙

Is there a good way to use regular expression in C# in order to find all filenames and their paths within a string variable? 有没有一种在C#中使用正则表达式的好方法,以便在string变量中查找所有文件名及其路径?

For example, if you have this string: 例如,如果您有此字符串:

string s = @"Hello John

these are the files you have to send us today: <file>C:\Development\Projects 2010\Accounting\file20101130.csv</file>, <file>C:\Development\Projects 2010\Accounting\orders20101130.docx</file>

also we would like you to send <file>C:\Development\Projects 2010\Accounting\customersupdated.xls</file>

thank you";

The result would be: 结果将是:

C:\Development\Projects 2010\Accounting\file20101130.csv
C:\Development\Projects 2010\Accounting\orders20101130.docx
C:\Development\Projects 2010\Accounting\customersupdated.xls

EDITED: Considering what told @Jim, I edited the string adding tags in order to make it easier to extract needed file names from string! 编辑:考虑到@Jim的内容,我编辑了字符串添加标签,以便更容易从字符串中提取所需的文件名!

Here's something I came up with: 这是我想出来的:

using System;
using System.Text.RegularExpressions;

public class Test
{

    public static void Main()
    {
        string s = @"Hello John these are the files you have to send us today: 
            C:\projects\orders20101130.docx also we would like you to send 
            C:\some\file.txt, C:\someother.file and d:\some file\with spaces.ext  

            Thank you";

        Extract(s);

    }

    private static readonly Regex rx = new Regex
        (@"[a-z]:\\(?:[^\\:]+\\)*((?:[^:\\]+)\.\w+)", RegexOptions.IgnoreCase);

    static void Extract(string text)
    {
        MatchCollection matches = rx.Matches(text);

        foreach (Match match in matches)
        {
            Console.WriteLine("'{0}'", match.Value);
        }
    }

}

Produces: (see on ideone ) 产生:(见ideone

'C:\projects\orders20101130.docx', file: 'orders20101130.docx'
'C:\some\file.txt', file: 'file.txt'
'C:\someother.file', file: 'someother.file'
'd:\some file\with spaces.ext', file: 'with spaces.ext'

The regex is not extremely robust (it does make a few assumptions) but it worked for your examples as well. 正则表达式不是非常强大(它确实做了一些假设)但它也适用于您的示例。


Here is a version of the program if you use <file> tags. 如果您使用<file>标签,这是该程序的一个版本。 Change the regex and Extract to: 将正则表达式和Extract更改为:

private static readonly Regex rx = new Regex
    (@"<file>(.+?)</file>", RegexOptions.IgnoreCase);

static void Extract(string text)
{
    MatchCollection matches = rx.Matches(text);

    foreach (Match match in matches)
    {
        Console.WriteLine("'{0}'", match.Groups[1]);
    }
}

Also available on ideone . 也可以在ideone使用

If you put some constraints on your filename requirements, you can use code similar to this: 如果您对文件名要求设置了一些限制,则可以使用与此类似的代码:

string s = @"Hello John

these are the files you have to send us today: C:\Development\Projects 2010\Accounting\file20101130.csv, C:\Development\Projects 2010\Accounting\orders20101130.docx

also we would like you to send C:\Development\Projects 2010\Accounting\customersupdated.xls

thank you";

Regex regexObj = new Regex(@"\b[a-z]:\\(?:[^<>:""/\\|?*\n\r\0-\37]+\\)*[^<>:""/\\|?*\n\r\0-\37]+\.[a-z0-9\.]{1,5}", RegexOptions.IgnorePatternWhitespace|RegexOptions.IgnoreCase);
MatchCollection fileNameMatchCollection = regexObj.Matches(s);
foreach (Match fileNameMatch in fileNameMatchCollection)
{
    MessageBox.Show(fileNameMatch.Value);
}

In this case, I limited extensions to a length of 1-5 characters. 在这种情况下,我将扩展名限制为1-5个字符。 You can obviously use another value or restrict the characters allowed in filename extensions further. 您显然可以使用其他值或进一步限制文件扩展名中允许的字符。 The list of valid characters is taken from the MSDN article Naming Files, Paths, and Namespaces . 有效字符列表取自MSDN文章命名文件,路径和命名空间

If you use <file> tag and the final text could be represented as well formatted xml document (as far as being inner xml, ie text without root tags), you probably can do: 如果您使用<file>标签,并且最终文本可以表示为格式良好的xml文档(就内部xml而言,即没有根标签的文本),您可能可以:

var doc = new XmlDocument();
doc.LoadXml(String.Concat("<root>", input, "</root>"));

var files = doc.SelectNodes("//file"):

or 要么

var doc = new XmlDocument();

doc.AppendChild(doc.CreateElement("root"));
doc.DocumentElement.InnerXml = input;

var nodes = doc.SelectNodes("//file");

Both method really works and are highly object-oriented, especially the second one. 这两种方法都很有效,并且是高度面向对象的,尤其是第二种方法。

And will bring rather more performance. 并将带来更多的性能。

See also - Don't parse (X)HTML using RegEx 另请参阅 - 不要使用RegEx解析(X)HTML

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM