简体   繁体   English

如何排除正则表达式中分隔符之间的多行?

[英]How to exclude multiple lines between separators in regex?

I was working with some logs in which there are several separators for each information field, eg:我正在处理一些日志,其中每个信息字段都有多个分隔符,例如:

********** Field #1 **********
Content inside Field #1
More content

********** Field #2 **********
Content inside Field #2
More content

...

********** The last field will always remain unchanged **********
Unchanged content from last field

Periodically, I have to delete all the content from the respective fields and manually provide the new data that is going to occupy that space.我必须定期删除各个字段中的所有内容,并手动提供将占用该空间的新数据。 The problem is that the logs are way too long to select and delete all of that content by hand, so I wrote a RegEx in Notepad++ find/replace to detect the end of a separator * and subsequent lines with \r\n until it bumps into another * .问题是日志太长到 select 并手动删除所有这些内容,所以我在Notepad++查找/替换中编写了一个 RegEx 来检测分隔符*的结尾和后续行\r\n直到它颠簸进入另一个*

Here follows my expression:以下是我的表达:

(?<=\*)([^\*]+\r\n)(?=\*)

How it works:怎么运行的:

  • First group: captures the last * from a group of stars/asterisks separator;第一组:从一组星号/星号分隔符中捕获最后一个*
  • Second group: captures everything that is not an asterisk or text inside the separators and ends with line break (at least I believe this is the correct interpretation);第二组:捕获分隔符内不是星号或文本的所有内容,并以换行符结尾(至少我相信这是正确的解释);
  • Third group: captures the beginning of a left separator * .第三组:捕获左分隔符*的开头。

As you may have read in the log example, the last field must stay unchanged, no matter what.正如您可能已经在日志示例中读到的那样,无论如何,最后一个字段必须保持不变。 So I am struggling to match the exact place after the last field.所以我正在努力匹配最后一个字段之后的确切位置。 I tried putting some unique reference from the last field's content inside the negated \* matching list in group 2,but no success.我尝试将最后一个字段的内容中的一些唯一引用放在第 2 组的否定\*匹配列表中,但没有成功。

Currently, the solution I wrote works well with all fields, but I wanted to make it regarding the condition that the last field must stay the same and be able to Replace All without changing last field.目前,我编写的解决方案适用于所有字段,但我想在最后一个字段必须保持不变并且能够在不更改最后一个字段的情况下Replace All的条件下进行。 Is there any way we can work with the existing solution and improve it?我们有什么办法可以使用现有的解决方案并加以改进吗? If not, is there another different solution for this case?如果没有,对于这种情况是否有另一种不同的解决方案?

Thank you so much in advance for any help.非常感谢您的帮助。

Update: no content field can contain * stars/asterisks, also, the number of * stars/asterisks can vary from field to field.更新:任何内容字段都不能包含*星号/星号,而且*星号/星号的数量可能因字段而异。 They are being used only for the purpose of separating the different information inside the log file.它们仅用于分隔日志文件中的不同信息。

My intention is to use this rule and replace the matched content by \n\n in find/replace.我的意图是使用此规则并在查找/替换中将匹配的内容替换为\n\n It will produce something like this:它会产生这样的东西:

********** Field #1 **********

********** Field #2 **********

...

********** The last field will always remain unchanged **********
Unchanged content from last field

You could match a line starting and ending with an asterix and then forget what is matched so far.您可以匹配以星号开头和结尾的行,然后忘记到目前为止匹配的内容。

The match all lines to delete that do not start with an asterix匹配所有不以星号开头的要删除的行

^\*.*\R\K.*(?:\R(?!\*).*)*\R(?=\*)

The pattern matches:模式匹配:

  • ^ Start of string ^字符串开始
  • \*.*\R Match * followed by the rest of the line and a newline \*.*\R匹配*后跟该行的 rest 和一个换行符
  • \K Forget what is matched so far \K忘记到目前为止匹配的内容
  • .* Match the whole line .*匹配整行
  • (?:\R(?.\*).*)* optionally repeat matching all lines that do not start with an asterix (?:\R(?.\*).*)*可选地重复匹配所有不以星号开头的行
  • \R Match a newline \R匹配换行符
  • (?=\*) Positive lookahead, assert * to the right (?=\*)正向前瞻,向右断言*

Regex demo正则表达式演示

Then replace with your content followed by a newline.然后替换为您的内容,然后换行。

I would try it with this regular expression:我会用这个正则表达式试试:

(^\*+.*\*+$\n)(?:.*\n)+?(?=^\*+.*\*+$\n)

This will find the first line with the content ** field 1 ** into the first group (including a \n - please add a \r if necessary, so every \n becomes a \r\n ), then matches all content including newlines (again here only with \n ) until the next field header is following (but the next field header is not part of the match).这将找到内容为** field 1 **的第一行到第一组(包括一个\n - 请在必要时添加一个\r ,以便每个\n变成一个\r\n ),然后匹配所有内容包括换行符(这里再次仅使用\n )直到下一个字段 header 紧随其后(但下一个字段 header 不是匹配项的一部分)。

So you can replace this expression with group 1 and should be left only with the field headers if you repeat this.所以你可以用组 1 替换这个表达式,如果你重复这个,应该只留下字段标题。 (Hint: in NotePad++ you can set \1 as replacement to achieve this.) (提示:在 NotePad++ 中,您可以将\1设置为替换来实现此目的。)

As the last field is not followed by another field header, it also will never match.由于最后一个字段后面没有另一个字段 header,因此它也永远不会匹配。

Please note that the regex expects at least two * at the begin and end of every field header line.请注意,正则表达式预计每个字段 header 行的开头和结尾至少有两个*

Another hint for NotePad++: please uncheck the ". matches newline" option to get the result you want. NotePad++ 的另一个提示:请取消选中“. matches newline”选项以获得您想要的结果。

Try it at https://regex101.com/r/5kc4m6/1试试https://regex101.com/r/5kc4m6/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM