将C＃中带有正则表达式的多行文本块拆分为matchcollection

Question

我试图在C＃应用程序中使用正则表达式分解包含自由格式文本的数据库字段。 添加的注释只是将人员注释附加到末尾。 这是示例格式：

Bob Smith [21-Mar-2013 10:46:02 AM]: this that and the other thing

followed by some linefeeds and somesuch

Alexey Jones [08-Jul-2013 1:44:59 PM]: and here is some other comment that I, Alexey deemed worthy to put into the system

I also like using the enter key

Kim Katillicus [09-Jun-2014 2:34:43 PM]: Don't forget about my comments

目的是让Alexey希望看到他的评论的输出，而不是其他人的评论（这将输出到静态报告中）。 我正在尝试使用以下正则表达式模式的变体来带回比赛集合：

^(.*\[\d{2}-\w{3}-\d{4}.*(AM|PM)\]:\s[\s\S]*)*

我只能从每个人条目中获得仅包含第一行的一个大斑点，其中包含所有内容或单独的比赛。 我正在寻找帮助来修复此模式。 不知道我是不是正在接近自己所拥有的东西，还是在吠错了树。

注意：我正在用Expresso测试我的表情。 目前，我已经检查了多线开关。

Answer 1

问题是这部分：

[\s\S]*

上面写着“匹配等于或不是空格0次或多次的任何内容”。 这将绝对包括表达式开始的第一次出现之后的所有内容。

在我看来，答案所需要的逻辑要多于单个正则表达式所能表达的逻辑。 例如，正如@evanmcdonnal指出的那样，您可以分割换行符，然后将每行与您的序言RegEx匹配，将行合并为单个注释，直到下一次匹配为止。 这是一种C＃方法：

public static class CommentsExtractor
{
    private static Regex preambleExpression =
        new Regex(@"^.*\[\d{2}-\w{3}-\d{4}.*(AM|PM)\]:\s");

    public static List<string> CommentsFromText(string text)
    {
        var comments = new List<string>();

        var lines = text.Split(new char[]{'\n', '\r'},
            StringSplitOptions.RemoveEmptyEntries);

        var currentComment = new StringBuilder();
        bool anyMatches = false;

        foreach (var line in lines)
        {
            var match = preambleExpression.Match(line);

            // If we see a new preamble, it's time to push
            //  the current comment into the list.
            // However, the first time through, we don't have
            //  any data, so we'll skip it.
            if(match.Success)
            {
                if (anyMatches)
                {
                    comments.Add(currentComment.ToString());
                    currentComment.Clear();
                }
                anyMatches = true;
            }

            currentComment.AppendLine(line);
        }

        // Now we need to push the last comment
        comments.Add(currentComment.ToString());

        return comments;
    }
}

Github上提供了一个工作示例WPF应用程序。

将C＃中带有正则表达式的多行文本块拆分为matchcollection

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-07-11 00:40:06

将C＃中带有正则表达式的多行文本块拆分为matchcollection

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-07-11 00:40:06

解决方案1
0 已采纳 2014-07-11 00:40:06