简体   繁体   中英

RegEx ignore a match if a whole word is anywhere in the string

Hi Im trying to get a RegEx to work. I have this text:

/Ffont2 45.83 Tf  252 980 Td (XX7445 DDA PURCHASE 05/28 04:48
MCDONALD'S F561 CHICAGO IL 105/29          10.25) Tj ET
0.000000 0.000000 0.000000 rg 0.000000 0.000000 0.000000 RG BT /Ffont2 45.83 Tf  252 937 Td (   12333378 214904443) Tj ET
0.000000 0.000000 0.000000 rg 0.000000 0.000000 0.000000 RG BT /Ffont2 45.83 Tf  252 894 Td (CITI CARD ONLINE PAYMENT 12345678                    05/29          87.99) Tj ET
0.000000 0.000000 0.000000 rg 0.000000 0.000000 0.000000 RG BT /Ffont2 45.83 Tf  252 851 Td (XX7445 DDA PURCHASE 0528 14:11 #03632 JEWEL CHICAGO IL     0529          97.60) Tj ET

and Im trying to get everything from Td to Tj like

Td (CITI CARD ONLINE PAYMENT 12345678                    05/29                87.99) Tj

but I want to skip things if they have no date, (must have forward slash), they must have a money amount(must have period) and I dont want it if it has the word "purchase" in it. So

Td (XX7445 DDA PURCHASE 0528 14:11 #03632 JEWEL CHICAGO IL     0529         97.60) Tj

would not be returned. right now I have

(Td \()([^\)]*)([^\)]*)([/][^\)]*[.][^\)]*\) Tj)

for my regex and that gets everything but it gets it even it has "purchase"

What you have is fine. Regex can be used for this.. but why put a Formula 1 car on a go-kart track (<--- bad analogy..) waste CPU cycles?

var matchesWithoutPurchase = Regex.Matches(yourInput, @"(Td \()([^\)]*)([^\)]*)([/][^\)]*[.][^\)]*\) Tj)")
    .Cast<Match>().Where(x => !x.Value.ToLower().Contains("purchase"));

foreach (var match in matchesWithoutPurchase) {
    Console.WriteLine(match);
}

Regex negative lookarounds are overkill for this.

If you want to use a regex to ensure that your match doesn't contain the word 'PURCHASE', you could use a negative look-ahead such as the following:

@"(?![^\)]*PURCHASE)(Td \()([^\)]*)([^\)]*)([/][^\)]*[.][^\)]*\) Tj)"

The look-ahead prevents a match if the word 'PURCHASE' appears before the next ) .

If you want to prevent 'purchase' also, you could add (?i) to the start of the regex, or add the RegexOptions.IgnoreCase flag as the last argument to the Regex method call.

Looking closer at your regex I notice that the second ([^\\)]*) is redundant as everything that it matches will be captured by the ([^\\)]*) immediately preceding it.

It also seems strange that your are capturing (Td \\() - the capture will always be Td ( , so why bother? And the second capture will start with / and end with Tj) - is that what you intended?

I assume you know that you could replace the [/] with \\/ , and [.] with \\. .

Anyway, to just capture what is inside the brackets, you could use:

@"(?![^\)]*PURCHASE)Td \(([^\)]*\/[^\)]*\.[^\)]*)\) Tj";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM