简体   繁体   中英

Regex to match the date from string with where clause

I need to get the date string ie 2019-01-22 15:36:141,023 from the below sample text only where the line contains Correct and not Test12 words. So i should ideally get two matches(Line 3 and 5) in the below string.

Line 1: 2019-01-22 15:36:141,043: [Test][123] INFORMATION - Testing: Correct Test12 ping

Line 2: 2019-01-22 15:36:141,029: [Test][124323] INFORMATION - Testing: Wrong Test12 ping

Line 3: 2019-01-22 15:36:141,023: [Test][12554363] INFORMATION - Testing: Correct Test ping

Line 4: 2019-01-22 15:36:141,123: [Test][6761213] INFORMATION - Testing: Wrong Test12 ping

Line 5: 2019-01-22 15:36:141,093: [Test][46543123] INFORMATION - Testing: Invalid Test ping

Line 6: 2019-01-22 15:36:141,890: [Test][887] INFORMATION - Testing: Correct Test ping

I can get the date string with (?\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2}(?:,\\d{3}\\b)?) but not sure how to include the other conditions. Any leads ?

Without adding extra complexity to the regex , you could iterate over the lines in the file and perform the checks for Test12 and Correct using regular string methods:

var results = new List<string>();
using (var sr = new StreamReader(filepath, true)) 
{
    var line = "";
    while ((line=sr.ReadLine()) != null)
    {
        if (line.Contains("Correct") && !line.Contains("Test12")) 
        {
            var res = Regex.Match(line, @"\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2,}(?:,\d{3}\b)?");
            if (res.Success)
            {
                results.Add(res.Value);
            }
        }
    }
}

With regex, i f the words you want to check the presence/ansence of occur after the date use

\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2,}(?:,\d{3}\b)?(?!.*Test12)(?=.*Correct)
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^

See the regex demo .

Here, (?!.*Test12)(?=.*Correct) are lookaheads that make sure 1) there is no Test12 and 2) there is a substring Correct after any 0+ chars other than newline as many as possible to the right of the current location (that is, after the date).

If these words may occur anywhere in the string, you may use

(?m)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2,}(?:,\d{3}\b)?(?=.*\r?$(?<!Test12.*)(?<=Correct.*))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See this regex demo .

Here, the (?m) option sets the RegexOptions.Multiline to true so that $ could be parsed as the end of a line anchor, and the (?=.*\\r?$(?<!Test12.*)(?<=Correct.*)) positive lookahead performs the following check: it requires that there are 0+ chars up to the end of the line, and then, at the end of the line, the two checks are performed with lookbehinds: the negative lookbehind (?<!Test12.*) makes sure there is no Test12 anywhere on a line and the positive lookbehind (?<=Correct.*) makes sure there is a Correct substring anywhere on a line.

The \\r? optional CR before $ is required due to a rather annoying fact that in the multiline mode $ does not match before \\r .

I think you mean a match for line 3 and 6 because line 5 does not contain Correct .

To not contains "Test12" you could use a negative lookahead. To match "Correct" after, you could match it in your pattern and use a word boundary \\b to prevent it being part of a larger word.

Your pattern might look like:

^(?!.*\bTest12\b).*?(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2,}(?:,\d{3}\b)?).*\bCorrect\b.*$

That will match:

  • ^ Start of string
  • (?!.*\\bTest12\\b) Assert what follows does not contain Test12
  • .*? Match any char non greedy
  • (\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2,}(?:,\\d{3}\\b)?) Capture in a group the date like pattern
  • .* Match any char 0+ times
  • \\bCorrect\\b Match Correct
  • .* Match any char 0+ times
  • $ End of the string

Regex demo | C# demo

Note

Should this part (?:,\\d{3}\\b)? also match a digit before the comma like (?:\\d,\\d{3}\\b)? looking at the example data?

Here is one way without Regex. The Date does not look correct. I think the comma is in wrong location so I fixed it.

            DateTime today = DateTime.Parse("2019-01-22 15:36:14");
            string input =
                "2019-01-22 15:36:14,1023: [Test][123] INFORMATION - Testing: Correct Test12 ping\n" +
                "2019-01-22 15:36:14,1023: [Test][124323] INFORMATION - Testing: Wrong Test12 ping\n" +
                "2019-01-22 15:36:14,1023: [Test][12554363] INFORMATION - Testing: Correct Test ping\n" +
                "2019-01-22 15:36:14,1023: [Test][6761213] INFORMATION - Testing: Wrong Test12 ping\n" +
                "2019-01-22 15:36:14,1023: [Test][46543123] INFORMATION - Testing: Invalid Test ping\n" +
                "2019-01-22 15:36:14,1023: [Test][887] INFORMATION - Testing: Correct Test ping";

            StringReader reader = new StringReader(input);
            string line = "";

            while ((line = reader.ReadLine()) != null)
            {
                string[] splitDate = line.Split(new string[] { ": [Test]" }, StringSplitOptions.None);
                DateTime date = DateTime.ParseExact(splitDate[0].Replace(",","."), "yyyy-MM-dd HH:mm:ss.FFFF", System.Globalization.CultureInfo.InvariantCulture);
                string[] splitTest = splitDate[1].Split(new char[] { ':' });

                if ((date.Date == today.Date) && splitTest[1].Contains("Correct") && !splitTest[1].Contains("Test12"))
                {
                    Console.WriteLine(line);
                }
            }
            Console.ReadLine();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM