简体   繁体   English

无法匹配除 ' 和 " 之外的任何字符的正则表达式

[英]Unable to match regex for any character except ' and "

I've written a regex to match against the string我写了一个正则表达式来匹配字符串

{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}

Regex:正则表达式:

/[^'"]({{2}[a-zA-Z0-9$_].*?}{2})[^'"]/gi

Breaking the regex above:打破上面的正则表达式:

  • [^'"] : Start with a character which is neither ' nor " . [^'"] :以既不是'也不是"的字符开头。
  • ({{2}[a-zA-Z0-9$_].*?}{2}) : Have exactly 2 {{ , then any character in the range a-zA-Z0-9$_ . ({{2}[a-zA-Z0-9$_].*?}{2}) :正好有 2 个{{ ,然后是a-zA-Z0-9$_范围内的任何字符。 After that, exactly 2 }}之后,正好 2 }}
  • [^'"] : Any character except for ' and " . [^'"] : 除'"之外的任何字符。

Below matches are not the exact matches but the captured groups.下面的匹配不是完全匹配,而是捕获的组。 I'll perform my operations on the captured groups so for simplicity, we can consider them as our matches.我将对捕获的组执行我的操作,因此为简单起见,我们可以将它们视为我们的匹配项。

Expected matches:预期匹配:

  • {{AB.group.one}}
  • {{AB.group.TWO}}
  • {{attr1111}}
  • {{attr_22_2qq2}}
  • {{AB.group.three}}
  • {{ab.group.fourth}}
  • {{attr1111}}}

Resultant matches:结果匹配:

  • {{AB.group.TWO}}
  • {{attr1111}}
  • {{attr_22_2qq2}}
  • {{AB.group.three}}
  • {{attr1111}}}

As you can see in the image below {{AB.group.one}} and {{ab.group.fourth}} do not match.正如您在下图中看到的{{AB.group.one}}{{ab.group.fourth}}不匹配。 I want them to match them as well.我希望它们也能匹配它们。

正则表达式

I know the reasons why they aren't matching.我知道他们不匹配的原因。

The reason why {{AB.group.one}} doesn't match is because [^'"] expects one character except for ' and " and I'm not providing one. {{AB.group.one}}不匹配的原因是因为[^'"]需要一个字符,除了'"而我没有提供一个。 If I replace [^'"] with ["'"]* , it'll work but in that case "{{AB.group.one}}" will match as well.如果我将[^'"]替换为["'"]* ,它会起作用,但在这种情况下"{{AB.group.one}}"也会匹配。

So, the problem statement is match any character(if there's any) before {{ and after }} but the character can't be ' or " .因此,问题陈述是匹配{{}}之前的任何字符(如果有的话),但该字符不能是'"

The reason why {{ab.group.fourth}} doesn't match is because the character preceding this match ie , is part of another match. {{ab.group.fourth}}不匹配的原因是此匹配之前的字符 ie ,另一个匹配的一部分。 This is just my speculation, the reason could be something else.这只是我的猜测,原因可能是别的。 But if I include any character between {{AB.group.three}}, and {{ab.group.fourth}} (eg {{AB.group.three}}, {{ab.group.fourth}} ), then the pattern matches.但如果我在{{AB.group.three}},{{ab.group.fourth}}之间包含任何字符(例如{{AB.group.three}}, {{ab.group.fourth}} ),然后模式匹配。 I have no idea how can I fix this.我不知道如何解决这个问题。

Please help me in solving these two problems.请帮我解决这两个问题。 Thank you.谢谢你。

Here is a regex based approach which seems to be working.这是一种基于正则表达式的方法,似乎有效。 First, we can string off all double-quoted terms, then replace islands of comma/colon with just a single comma separator.首先,我们可以将所有双引号术语串起来,然后用一个逗号分隔符替换逗号/冒号岛。 Finally, split on comma to generate an array of terms.最后,以逗号分隔以生成术语数组。

 var input = "{{AB.group.one}}:\"eighth\",{{AB.group.TWO}}:\"third\",{{attr1111}}:\"fourth\",\"fifth\":{{attr_22_2qq2}},\"sixth\":{{AB.group.three}},{{ab.group.fourth}}:\"seventh\",\"ninth\":{{attr1111}}},\"blah\":\"stuff\",{{one}}:{{two}}"; var terms = input.replace(/\".*?\"/g, "").replace(/[,:]+/g, ",").split(","); console.log(terms);

You were actually really close with what you had.你实际上非常接近你所拥有的。

 let input = '{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}' let regex = /(?<=[^'"]?)({{2}[a-zA-Z0-9$_].*?}{2})(?=[^'"]?)/gi; console.log(input.match(regex))

(?<=[^'"]?) is a positive lookbehind. Since the negated character set is used, we're checking that the character before the match is not ' or ". (?<=[^'"]?)是一个肯定的向后看。由于使用了否定字符集,我们正在检查匹配之前的字符不是 ' 或 "。 The question mark makes this optional - match zero or one of the previous token (the negated character set).问号使这个可选 - 匹配零或前一个标记(否定字符集)之一。

(?=[^'"]?) is a positive lookahead and checks the token immediately after the expression to ensure that it's not a ' or " (or that there is no token after the expression). (?=[^'"]?)是一个肯定的前瞻,并在表达式之后立即检查标记以确保它不是 ' 或 " (或者表达式之后没有标记)。

Another option, since lookbehinds aren't supported in every browser:另一种选择,因为并非每个浏览器都支持后视:

 let input = '{{AB.group.one}}:"eighth",{{AB.group.TWO}}:"third",{{attr1111}}:"fourth","fifth":{{attr_22_2qq2}},"sixth":{{AB.group.three}},{{ab.group.fourth}}:"seventh","ninth":{{attr1111}}}' let regex = /(?:[^{'"])?({{2}[a-zA-Z0-9$_].*?}{2})(?:[^}'"])?/gi console.log([...input.matchAll(regex)].map(reg => reg[1]))

String.match() loses reference to capture groups when the global flag is passed, so only returns the "match". String.match() 在传递全局标志时失去对捕获组的引用,因此只返回“匹配”。 Since you're creating a capture group with ({{2}[a-zA-Z0-9$_].*?}{2}) , if you wanted to just ensure the characters immediately surrounding the bracketed expression aren't quotation marks, you can just use non-capture groups for those optional checks.由于您正在使用({{2}[a-zA-Z0-9$_].*?}{2})创建捕获组,因此如果您只想确保括号表达式周围的字符不是引号,您可以只使用非捕获组进行这些可选检查。

(?:[^{'"])? is a non-capturing group, as is (?:[^}'"])? (?:[^{'"])?是一个非捕获组,与(?:[^}'"])?

Using String.matchAll, the first element of the arrays created for each match is the entire match, the second element is the first capturing group, etc. So the logic for mapping over [...input.matchAll(regex)] is just to collect the capturing group from each match.使用 String.matchAll,为每个匹配创建的 arrays 的第一个元素是整个匹配,第二个元素是第一个捕获组,等等。所以映射到[...input.matchAll(regex)]的逻辑就是从每场比赛中收集捕获组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM