简体   繁体   中英

Regex to return unique lines when pattern matched

I am parsing a log file and trying to match error statements. The part of the line I am matching "error CS" will apply to numerous lines some duplicates some not. Is there a way I can not return the duplicates. Using Java flavor of RegEx..

example: my simple regex returns

Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context
Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context

would like it to return:

Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context

Technically speaking, with a regular expression, this is not possible. You need something more powerful.

Regular expressions are meant for matching regular languages. The pattern you are attempting to match is not regular.

You require the expression to remember some 'state', the previously matched errors, and regular expressions are not meant to handle this type of computation. A Turing Machine is capable of saving state. This is more along the lines of what you need. (Java will fit the bill nicely.)

This could be fairly easily solved by adding some extra logic into your log parser after you find all of the error lines.

One solution is to match using your regexp and then put the line into a data structure like a set which deals with removing duplicates for you. At the end of parsing just print the contents of the set.

If you're concerned about order you could add to a map of some kind with the line as the key and the line number as the value (perhaps checking for a matching entry before inserting). If you sort by value you'll get a list of the first instance of a given line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM