简体   繁体   中英

What's wrong with my regex?

I'm using System::Text::RegularExpressions::Regex to try and find a startup message in a log file. My expression is as follows:

using namespace System::Text::RegularExpressions;
Regex^ logStartRegex = gcnew Regex( "^=+ .* \\((\\d+)/(\\d+)/(\\d+) @ (\\d+):(\\d+):(\\d+)\\) (.*) =+", static_cast<RegexOptions>(RegexOptions::Compiled | RegexOptions::IgnoreCase) );

...and my test data is:

========= Logging started (07/10/2011 @ 15:38:54) v1.000 AA000 =========

...but when I do the following I don't get a match:

logStartRegex->Match( "========= Logging started (07/10/2011 @ 15:38:54) v1.000 AA000 =========\n" );

I've tested it in regexpal which indicates it works (note that in the C++ version we have to escape all the '\\' characters): ^=+ .* \\((\\d+)/(\\d+)/(\\d+) @ (\\d+):(\\d+):(\\d+)\\) (.*) =+ . Is there any way to see where exactly this is breaking down?

I just tried the following program, copied from what you provided:

using namespace System;
using namespace System::Text::RegularExpressions;

int main(array<System::String ^> ^args)
{
   Regex^ logStartRegex = gcnew Regex( "^=+ .* \\((\\d+)/(\\d+)/(\\d+) @ (\\d+):(\\d+):(\\d+)\\) (.*) =+", static_cast<RegexOptions>(RegexOptions::Compiled | RegexOptions::IgnoreCase) );
   Match^ match = logStartRegex->Match( "========= Logging started (07/10/2011 @ 15:38:54) v1.000 AA000 =========\n" );
   Console::WriteLine(match->Success);
   Console::ReadKey();    
   return 0;
}

It writes out True to the screen, meaning that it found a match. So I suppose the problem must be somewhere else in your program.

I believe this should follow the conventions for the .Net Framework Regular Expression Flavor, though I don't know C++ very well anymore... if it does not however, and leans more towards the Java implementation and API and is like the [Regex] Matcher.matches() method, it will attempt to match the regex against the entire source (if it does not match the entire source, but maybe only matches part of it, the attempt will fail). The .Net call to Regex.Match() function will Find the expression in the provided input, and return true if found

That was a long way of saying: Make sure your input string does not contain any trailing spaces or other characters.

One more note - if your input is actually multiple lines, especially where the other lines contain dates and times in parentheses () - your expression includes greedy quantifiers applied to the dot character class " . " that will at least make it run VERY SLOWLY for large input, if not somehow trip it up and make it fail.

In any case, you can make your expression a little more efficient by changing the instances of .* to [^(]* and [^=]* , respectively, as follows:

"^=+ [^(]* \\((\\d+)/(\\d+)/(\\d+) @ (\\d+):(\\d+):(\\d+)\\) ([^=]*) =+"

The greedy quantifiers you would replace would otherwise match the entire string many times, then backtrack many times, only to finally come back to the place ten or twenty characters after it started to say "oh, ok, this matches... what's next?"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM