正则表达式遵循模式，大括号之间除外

Question

I am having a tough time figuring out a clean Regex (in a Javascript implementation) that will capture as much of a line as it can following a pattern, but anything inside braces doesn't need to follow the pattern.我很难找出一个干净的正则表达式（在 Javascript 实现中），它将尽可能多地捕获一条线，因为它可以遵循一个模式，但是大括号内的任何东西都不需要遵循这个模式。 I'm not sure the best way to explain that except by example:我不确定最好的解释方式，除了举例：

For example: Let's say the pattern is, the line must start with 0, end with a 0 anywhere, but only allow sequence of 1, 2 or 3 in between, so I use ^(0[123]+0) .例如：假设模式是，行必须以 0 开头，在任何地方以 0 结尾，但只允许其间有 1、2 或 3 的序列，所以我使用^(0[123]+0) 。 This should match the first part of the strings:这应该匹配字符串的第一部分：


    0213123123130
    012312312312303123123
    01231230123123031230
    etc.

But I want to be able to insert {gibberish} between braces into the line and have the Regex allow it to disrupt the pattern.但我希望能够在大括号之间插入{gibberish}并让正则表达式允许它破坏模式。 ie, ignore the pattern of the curly braces and everything inside, but still capture the full string including the {gibberish} .即，忽略花括号的模式和里面的所有内容，但仍然捕获包括{gibberish}在内的完整字符串。 So this would capture everything in bold:所以这将以粗体捕获所有内容：


    01232231{whatever 3 gArBaGe? I want.}121{foo}2310312{bar}3120123

and a 0 inside the braces does not end the capture prematurely, even if the pattern is correct.即使模式正确，大括号内的 0 也不会过早结束捕获。


    01213123123123{21310030123012301}31231230123

EDIT: Now, I know I could just do something like ^0[123]*?(?:{.*})*?[123]*?0 maybe?编辑：现在，我知道我可以做类似^0[123]*?(?:{.*})*?[123]*?0的事情吗？ But that only works if there is a single set of braces, and now I have to duplicate my [123] pattern.但这只有在有一组大括号时才有效，现在我必须复制我的[123]模式。 As that [123] pattern gets more complex, having it appear more than once in the Regex starts getting really incomprehensible.随着[123]模式变得越来越复杂，让它在正则表达式中出现不止一次开始变得非常难以理解。 Something like the best regex trick seemed promising but I couldn't figure out how to apply it here.最好的正则表达式技巧之类的东西似乎很有希望，但我不知道如何在这里应用它。 Using crazy lookarounds seems like the only way now but I would hope there's a cleaner way.使用疯狂的环视似乎是现在唯一的方法，但我希望有一种更清洁的方法。

Answer 1

Since you've specified that you want the whole match including the garbage, you can use ^0([123]+(?:{[^}]*}[123]*)*)0 and use $1 to get the part between the 0s, or $0 to get everything that matched.由于您已指定要包括垃圾在内的整个匹配项，因此您可以使用^0([123]+(?:{[^}]*}[123]*)*)0并使用 $1 来获取部分在 0 之间，或 $0 以获得匹配的所有内容。

https://regex101.com/r/iFSabs/3 https://regex101.com/r/iFSabs/3

Here's the rundown on how the regex works:以下是正则表达式如何工作的概要：

^ anchors the match to start at the beginning of the line ^将匹配锚定在行首
0 matches a literal zero character 0匹配文字零字符
([123]+(?:{[^}]*}[123]*)*) is a capturing group that captures everything inside of it. ([123]+(?:{[^}]*}[123]*)*)是一个捕获组，它捕获其中的所有内容。
- [123]+ matches one or more instances of 1 , 2 , or 3 [123]+匹配1 、 2或3的一个或多个实例
- (?:{[^}]*}[123]*)* is a non-capturing group. (?:{[^}]*}[123]*)*是一个非捕获组。 Ie it'll be part of the match, but won't have a $# for use in replacement or the match.即它将成为比赛的一部分，但不会有 $# 用于替换或比赛。
  - {[^}]*} matches a literal { followed by any number of non } characters followed by } {[^}]*}匹配文字{后跟任意数量的非}字符后跟 }
  - [123]* matches zero or more instances of 1 , 2 , or 3 [123]*匹配1 、 2或3的零个或多个实例
  - Then this whole non-capturing group can be matched 0 or more times.那么这整个非捕获组可以匹配0次或多次。

The process behind this regex is known as unrolling the loop.此正则表达式背后的过程称为展开循环。 http://www.softec.lu/site/RegularExpressions/UnrollingTheLoop gives a good description of it. http://www.softec.lu/site/RegularExpressions/UnrollingTheLoop给出了很好的描述。 (with a few typo fixes) （有一些错字修复）

The unrolling the loop technique is based on the hypothesis that in most case, you [know] in a [repeated] alternation, which case should be the most usual and which one is exceptional.展开循环技术是基于这样一个假设，即在大多数情况下，您会 [知道] 在 [重复] 交替中，哪种情况应该是最常见的，哪种情况是例外的。 We will called the first one, the normal case and the second one, the special case.我们将第一个称为正常情况，将第二个称为特殊情况。 The general syntax of the unrolling the loop technique could then be written as:展开循环技术的一般语法可以写成：

normal* ( special normal* )*正常*（特殊正常*）*

Which could means something like, match the normal case, if you find a special case, matched it than match the normal case again.这可能意味着类似，匹配正常情况，如果你找到一个特殊情况，匹配它而不是再次匹配正常情况。 [You'll] notice that part of this syntax could [potentially] lead to a super-linear match. [你会]注意到这个语法的一部分可能[潜在地]导致超线性匹配。

Example using Regex#test and Regex#match:使用 Regex#test 和 Regex#match 的示例：

 const strings = [ '0213123123130', '012312312312303123123', '01231230123123031230', '01213123123123{21310030123012301}31231230123', '01212121{hello 0}121312', '012321212211231{whatever 3 gArBaGe? I want.}1212313123120123', '012321212211231{whatever 3 gArBaGe? I want.}121231{extra garbage}3123120123', ]; const regex = /^0([123]+(?:{[^}]*}[123]*)*)0/ console.log('tests') console.log(strings.map(string => `'${string}': ${regex.test(string)}`)) console.log('matches'); let matches = strings.map((string) => regex.exec(string)).map((match) => (match? match[1]: undefined)); console.log(matches);

Robo Robok's answer is where I'd go with if you want to only keep the non braced part, although using a slightly different regex ( {[^}]*} ) for a bit more performance. Robo Robok 的答案是 go 如果您只想保留非支撑部分，尽管使用稍微不同的正则表达式（ {[^}]*} ）以获得更高的性能。

Answer 2

How about the other way around?反过来呢？ Checking the string with curly tags removed:检查删除了卷曲标签的字符串：

const string = '012321212211231{whatever 3 gArBaGe? I want.}1212313123120123{foo}123';
const stringWithoutTags = string.replace(/\{.*?\}/g, '');

const result = /^(0[123]+0)/.test(stringWithoutTags);

Answer 3

You say you need to capture everything, including the gibberish, so I think a simple pattern like this should work:您说您需要捕获所有内容，包括乱码，所以我认为像这样的简单模式应该可以工作：

^(0(?:[123]|{.+?})+0)

That allows a string starting with 0, and then any of your pattern characters (1, 2, or 3), or one of the { gibberish } sections, and allows that to repeat to handle multiple gibberish sections, and finally it must end with a 0.这允许以 0 开头的字符串，然后是任何模式字符（1、2 或 3）或{ gibberish }部分之一，并允许重复处理多个乱码部分，最后它必须以一个 0。

https://regex101.com/r/K4teGY/2 https://regex101.com/r/K4teGY/2

Answer 4

You might use你可能会使用

^0[123]*(?:{[^{}]*}[123]*)*0

^ Start of string ^字符串开头
0 Match a zero 0匹配一个零
[123]* Match 0+ times either 1, 2 or 3 [123]*匹配 0+ 次 1、2 或 3
(?: Non capture group (?:非捕获组
- {[^{}]*}[123]* match from an opening till closing } followed by 0+ either 1, 2 or 3 {[^{}]*}[123]*匹配从开始到结束}后跟 0+ 1、2 或 3
)* Close group and repeat 0+ times )*关闭组并重复 0+ 次
0 Match a zero 0匹配一个零

Regex demo正则表达式演示

正则表达式遵循模式，大括号之间除外

问题描述

4 个解决方案

解决方案1
4 已采纳 2020-05-26 01:48:49

解决方案2
1 2020-05-26 01:43:06

解决方案3
1 2020-05-26 02:02:18

解决方案4
1 2020-05-26 07:30:14

正则表达式遵循模式，大括号之间除外

问题描述

4 个解决方案

解决方案1 4 已采纳 2020-05-26 01:48:49

解决方案2 1 2020-05-26 01:43:06

解决方案3 1 2020-05-26 02:02:18

解决方案4 1 2020-05-26 07:30:14

解决方案1
4 已采纳 2020-05-26 01:48:49

解决方案2
1 2020-05-26 01:43:06

解决方案3
1 2020-05-26 02:02:18

解决方案4
1 2020-05-26 07:30:14