[英]Regex - match any text between some delimiters
I try to catch this string [[....]]
(including brackets) 我尝试捕获此字符串[[....]]
(包括方括号)
where ....
can be anything (including non-printable) except ]]
....
可以是除]]
以外的任何内容(包括不可打印的内容)
Here is the source where to match : 这是匹配的来源:
var myString = 'blablablabla[["<strong>LA DEFENSE 4 TEMPS ( La Rotonde )</strong><br />Centre commercial LES 4 TEMPS",
48.89141725,
2.23478235,
"4T"],
["<strong>ANGERS</strong><br />Centre commercial GEANT",
48.89141725,
2.23478235,
"4T"]]blablablabla'
I try to use this method [^\\]]+
to match all chars/non-chars except double bracket. 我尝试使用此方法[^\\]]+
匹配除双括号之外的所有字符/非字符。 The problem i have is that i do not know how to use this method with a bracket that is immediatly after the first bracket [^\\]\\]]+
. 我的问题是我不知道如何在第一个括号[^\\]\\]]+
之后立即使用此方法。
Is there a solution with positive/negative lookahead or word boundary ? 是否存在正/负前瞻或单词边界的解决方案?
(\[\[[^\](?=\])]+)
Any help please ? 有什么帮助吗?
In JavaScript, to match any text between some delimiters that consist of more than one character is best achieved with the [^]
/ [\\s\\S]
/ [\\d\\D]
/ [\\w\\W]
construct with a lazy quantifier ( *?
matching 0 or more occurrences, or +?
matching 1 or more occurrences of the preceding subpattern, but as few as possible to return a valid match). 在JavaScript中,最好使用[^]
/ [\\s\\S]
/ [\\d\\D]
/ [\\w\\W]
构造(在一个带有多个字符的定界符之间匹配任何文本)来实现最佳匹配量词( *?
匹配0次或多次出现,或+?
匹配1次或多次发生在前子模式,但返回有效匹配的次数越少越好)。
While [^]
construct matching any character including a newline is JavaScript specific, [\\s\\S]
and its variants are mostly cross-platform constructs that will work in PCRE, .NET, Python, Java, etc. The [...]
in this case is a character class that contains two opposite shorthand classes. 尽管与任何字符(包括换行符)匹配的[^]
构造都是JavaScript特定的,但是[\\s\\S]
及其变体大部分是跨平台构造,可在PCRE,.NET,Python,Java等环境中使用[...]
在这种情况下, [...]
是一个字符类,其中包含两个相反的速记类。 Since \\s
matches all whitespace characters and \\S
matches all non-whitespace characters, this [\\s\\S]
matches any symbol there is in any input. 由于\\s
匹配所有空白字符,而\\S
匹配所有非空白字符,因此此[\\s\\S]
匹配任何输入中存在的任何符号。
I'd recommend to avoid using (.|\\n)
. 我建议避免使用(.|\\n)
。 This construct causes more backtracking steps to occur and slows regex search down. 这种构造会导致发生更多的回溯步骤,并减慢正则表达式的搜索速度。
So, you can use 因此,您可以使用
\[\[[\d\D]*?]]
See JS regex demo 参见JS正则表达式演示
Here is a code snippet: 这是一个代码片段:
var re = /\\[\\[[\\d\\D]*?]]/g; var str = 'blablablabla[["<strong>LA DEFENSE 4 TEMPS ( La Rotonde )</strong><br />Centre commercial LES 4 TEMPS",\\n 48.89141725,\\n 2.23478235,\\n "4T"],\\n ["<strong>ANGERS</strong><br />Centre commercial GEANT",\\n 48.89141725,\\n 2.23478235,\\n "4T"]]blablablabla'; var m; while ((m = re.exec(str)) !== null) { console.log(m[0]); }
UPDATE UPDATE
In this case, when the delimiters are different and consist of just 2 characters, you can use a technique of matching all characters other than the first symbol of the closing delimiter and then 0 or more sequences of the whole closing delimiter followed by 1 or more occurrences of any symbol other than the first symbol in the closing delimiter. 在这种情况下,当定界符不同并且仅由2个字符组成时,可以使用一种技术来匹配所有字符,而不是闭合定界符的第一个符号,然后匹配整个闭合定界符的0个或多个序列,然后是1个或多个除定界符中的第一个符号以外的任何符号都出现。
\[\[[^\]]*(?:][^\]]+)*]]
See regex demo 见正则表达式演示
The linear character of this regex makes it really fast. 此正则表达式的线性特征使其速度非常快。
PS I also want to note that you do not need to escape the ]
outside of character class in JS regex, but it must be escaped inside a character class - always. PS我也要注意,您不需要在JS正则表达式中的字符类外部转义]
,但必须在字符类内部转义-始终。
Try this: 尝试这个:
\[\[(.|\n)*?\]\]
https://regex101.com/r/gR5oJ3/1 https://regex101.com/r/gR5oJ3/1
It should match anything between and including [[
]]
. 它应该匹配[[
]]
之间的任何内容。 The main issue was dealing with newlines , and the (.|\\n)
part will match anything including newlines . 主要问题是处理换行符 , (.|\\n)
部分将匹配包括换行符在内的所有内容 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.