简体   繁体   中英

Match regex pattern when not inside a set of quotes (text spans multiple lines)

This is a continuation of my previous question .NET regex engine returns no matches but I am expecting 8 .

My query is handling everything perfectly and I have my capture groups working great, however I have found a edge case that I do not know how to handle.

Here is a test case that I am having trouble with.

INSERT INTO [Example] ( [CaseNumber] , [TestText] )
VALUES
(1 , 'Single Line Case'),
(2 , 'Multi
Line Case');
(3 , 'Two Lines with odd end '');
Case');
(4 , ''),
(5 , 'Case 3 is the Empty Text Case');

Here is my pattern I am using, I use the RegexOptions flags Singleline , Multiline , ExplicitCapture , and IgnorePatternWhitespace

^\(
((('(?<s>.*?)'(?!')) |
 (?<n>-?[\d\.]+)
 )(\s,\s)?
)+
#(?<!'')   #Commented Case 3 works, un-commented case 2 works
\)[;,]\r?$

I can either handle Case 3 or Case 4 but I am having trouble handling both.

If I had a way to check to see if there was a even number of ' in the capture group 's` I could check then to see if we are on a real end of line or in text block that has a line that ends that just happens to match the pattern. but I can not figure out how to modify other examples to handle multiple lined text strings.

Can what I want be done with a single regex query or am I forced to do post processing (using the commented case) and do this is two passes?


Here is the code to run it in LINQPad

string text = 
@"INSERT INTO [Example] ( [CaseNumber] , [TestText] )
VALUES
(1 , 'Single Line Case'),
(2 , 'Multi
Line Case');
(3 , 'Two Lines with odd end '');
Case');
(4 , ''),
(5 , 'Case 3 is the Empty Text Case');
";

const string recordRegex =
@"^\(
((('(?<s>.*?)'(?!')) |
 (?<n>-?[\d\.]+)
 )(\s,\s)?
)+
#(?<!'')   #Commented Case 3 works, un-commented case 2 works
\)[;,]\r?$";

var records = Regex.Matches(text, recordRegex, RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);
records.Dump();

An expression like this would match such quotes:

(?:'[^']*')+

If you want to match foo when it's not inside such quotes, you could use something like:

foo(?=[^']*(?:'[^']*'[^']*)+\z)

one match per line with the unquoted text and numbers as capture groups

Something like this:

(?xm)^
\(

(?:
    (?:
        (?<quote> (?:'[^']*')+ )
    |   (?<num>   -?\d+(?:\.\d+)? )
    |   (?<x>     X'[0-9a-f]*' )
    )
    (?:\s*,\s*)?
)+

\)
[;,] 
\r?$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM