简体   繁体   中英

Regex to match string and all content after it to another occurrence of the same string

I'm trying to match every new error log line with Regex in C#. I want to have new match every occurence of date

[yyyy-MM-dd HH:mm:ss,ffff]

Here is the sample data and my current (not working) solution:

Regex

(\[[0-9]{4}\-[0-9]{2}\-[0-9]{2} [0-9]{2}\:[0-9]{2}\:[0-9]{2}\,[0-9]{3}\])(.*)

String to match

[2018-06-28 00:58:14,596] - INFO  - [54] - ProcessItemController - Processing url: http://somehttp.com/something.xml/
[2018-06-28 00:58:14,612] - ERROR - [54] - ProcessItemController - Processing Failed
System.UnauthorizedAccessException: Access to the path 'D:\SomePath\something.xlsx' is denied.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalDelete(String path, Boolean checkHost)
   at Something.Processors.PathAttachmentExtractorProcessor.XmlParser(String path, String outputPath, ProcessingItem processingItem)
   at Something.Processors.EurekaInfoPathAttachmentExtractorProcessor.ProcessItem(ProcessingItem processingItem)
   at Something.ProcessItemController.Process(Item item)
[2018-06-28 00:58:14,627] - INFO  - [69] - ProcessItemController - Processing url: http://someurl.com/cables.xml/
[2018-06-28 00:58:14,627] - ERROR - [69] - ProcessItemController - Processing Failed
System.UnauthorizedAccessException: Access to the path 'D:\SomePath\anotherSomething.xlsx' is denied.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalDelete(String path, Boolean checkHost)
   at Something.Processors.PathAttachmentExtractorProcessor.XmlParser(String path, String outputPath, ProcessingItem processingItem)
   at Something.Processors.PathAttachmentExtractorProcessor.ProcessItem(ProcessingItem processingItem)
   at Something.ProcessItemController.Process(Item item)

https://regex101.com/r/6BJpKF/1/

The problem is that when there is error log then the pattern doesn't get the exception description that is in the new line.

Is there a way to get all data between each occurence of the date (with the date itself) in separate matches?

Try following solution :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.txt";
        static void Main(string[] args)
        {

            string input = File.ReadAllText(FILENAME);

            string pattern = @"^(?'date'\[[^\]]+)\]\s+-\s+(?'type'[^\s]+)\s+-\s+\[(?'message'[^\[]*)";

            MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Multiline);

            foreach (Match match in matches)
            {
                Console.WriteLine("Date : '{0}', Type : '{1}', Error Number = '{2}', Message = '[{3}'",
                   match.Groups["date"], match.Groups["type"], match.Groups["errNum"], match.Groups["message"]);
            }
            Console.ReadLine();
        }

    }
}

Using only regex this should work:

string datetimeRegex = @"\[[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}\]";

var rx = new Regex(@"(?:^|(?<=\n))" + datetimeRegex + @"(?:(?!(?<=\n)" + datetimeRegex + @").)*", RegexOptions.Singleline);

Match m;
int ix = 0;

while ((m = rx.Match(str, ix)).Success)
{
    // Your log
    string log = m.Value;
    ix += log.Length;
}

But I'm not very happy. I think it is doable in a simpler way. Note that each log will retain its final \\r?\\n . (?:^|(?<=\\n)) means "beginning of the string or following a new line". (?!(?<=\\n)" + datetimeRegex + @") means that a datetime preceded by a \\n will stop the .* matching.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM