简体   繁体   中英

How to exclude multiple lines between separators in regex?

I was working with some logs in which there are several separators for each information field, eg:

********** Field #1 **********
Content inside Field #1
More content

********** Field #2 **********
Content inside Field #2
More content

...

********** The last field will always remain unchanged **********
Unchanged content from last field

Periodically, I have to delete all the content from the respective fields and manually provide the new data that is going to occupy that space. The problem is that the logs are way too long to select and delete all of that content by hand, so I wrote a RegEx in Notepad++ find/replace to detect the end of a separator * and subsequent lines with \r\n until it bumps into another * .

Here follows my expression:

(?<=\*)([^\*]+\r\n)(?=\*)

How it works:

  • First group: captures the last * from a group of stars/asterisks separator;
  • Second group: captures everything that is not an asterisk or text inside the separators and ends with line break (at least I believe this is the correct interpretation);
  • Third group: captures the beginning of a left separator * .

As you may have read in the log example, the last field must stay unchanged, no matter what. So I am struggling to match the exact place after the last field. I tried putting some unique reference from the last field's content inside the negated \* matching list in group 2,but no success.

Currently, the solution I wrote works well with all fields, but I wanted to make it regarding the condition that the last field must stay the same and be able to Replace All without changing last field. Is there any way we can work with the existing solution and improve it? If not, is there another different solution for this case?

Thank you so much in advance for any help.

Update: no content field can contain * stars/asterisks, also, the number of * stars/asterisks can vary from field to field. They are being used only for the purpose of separating the different information inside the log file.

My intention is to use this rule and replace the matched content by \n\n in find/replace. It will produce something like this:

********** Field #1 **********

********** Field #2 **********

...

********** The last field will always remain unchanged **********
Unchanged content from last field

You could match a line starting and ending with an asterix and then forget what is matched so far.

The match all lines to delete that do not start with an asterix

^\*.*\R\K.*(?:\R(?!\*).*)*\R(?=\*)

The pattern matches:

  • ^ Start of string
  • \*.*\R Match * followed by the rest of the line and a newline
  • \K Forget what is matched so far
  • .* Match the whole line
  • (?:\R(?.\*).*)* optionally repeat matching all lines that do not start with an asterix
  • \R Match a newline
  • (?=\*) Positive lookahead, assert * to the right

Regex demo

Then replace with your content followed by a newline.

I would try it with this regular expression:

(^\*+.*\*+$\n)(?:.*\n)+?(?=^\*+.*\*+$\n)

This will find the first line with the content ** field 1 ** into the first group (including a \n - please add a \r if necessary, so every \n becomes a \r\n ), then matches all content including newlines (again here only with \n ) until the next field header is following (but the next field header is not part of the match).

So you can replace this expression with group 1 and should be left only with the field headers if you repeat this. (Hint: in NotePad++ you can set \1 as replacement to achieve this.)

As the last field is not followed by another field header, it also will never match.

Please note that the regex expects at least two * at the begin and end of every field header line.

Another hint for NotePad++: please uncheck the ". matches newline" option to get the result you want.

Try it at https://regex101.com/r/5kc4m6/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM