[英]Scala: Regular Expression pattern match with curly braces?
so I am creating an WML like language for my assignment and as a first step, I am supposed to create regular expressions to recognize the following: 所以我要为我的作业创建类似WML的语言,并且第一步,我应该创建正则表达式以识别以下内容:
//single = "{"
//double = "{{"
//triple = "{{{"
here is my code for the second one: 这是我的第二个代码:
val double = "\\{\\{\\b".r
and my Test is: 我的测试是:
println(double.findAllIn("{{ s{{ { {{{ {{ {{x").toArray.mkString(" "))
Bit it doesn't print anything ! 一点都不打印! It's supposed to print the first, second, fifth and 6th token. 它应该打印第一个,第二个,第五个和第六个令牌。 I have tried every single combination of \\b and \\B and even \\{{2,2} instead of \\{\\{ but it's still not working. 我已经尝试了\\ b和\\ B甚至甚至是\\ {{2,2}而不是\\ {\\ {的每个组合,但是仍然无法正常工作。 Any help?? 有帮助吗?
As a side question, If I wanted it to match just the first and fifth tokens, what would I need to do? 作为附带的问题,如果我希望它仅与第一个和第五个令牌匹配,我该怎么办?
I tested your code (Scala 2.12.2 REPL), and in contrary to your "it doesn't print anything" statement, it actually prints "{{" occurrence from "{{x" substring. 我测试了您的代码(Scala 2.12.2 REPL),与您的“不打印任何内容”语句相反,它实际上从“ {{x””子字符串打印“ {{”出现的情况。
This is because x
is a word character and \\b
matches a position between second {
and x
. 这是因为x
是单词字符, \\b
匹配第二{
和x
之间的位置 。 Keep in mind that {
isn't a word character, unlike x
. 请记住, {
不是单词字符,与x
不同。
As per this tutorial 按照本教程
It matches at a position that is called a "word boundary". 它在称为“单词边界”的位置匹配。 This match is zero-length 这场比赛是零长度
There are three different positions that qualify as word boundaries: 有三个不同的位置可作为单词边界:
1) Before the first character in the string, if the first character is a word character 1)如果字符串中的第一个字符是单词字符,则在字符串中第一个字符之前
... ...
As for solution, it depends on precise definition, but lookarounds seemed to work for me: 至于解决方案,这取决于精确的定义,但是环顾四周似乎对我有用:
"(?<!\\{)\\{{2}(?!\\{)".r
It matched "first, second, fifth and 6th token". 它匹配了“第一,第二,第五和第六令牌”。 The expression says match "{{" not preceded and not followed by "{". 该表达式表示匹配项“ {{”不位于“ {”之前和之后。
For side-question: 附带问题:
"(?<![^ ])\\{\\{(?![^ ])".r //match `{` surrounded by spaces or line boundaries
Or, depending on your interpretation of "space": 或者,取决于您对“空间”的解释:
"(?<!\\S)\\{\\{(?!\\S)".r
matched 1st and 5th tokens. 匹配第一个和第五个令牌。 I couldn't use positive lookarounds coz I wanted to take line beginnings and endings (boundaries) into account automatically. 我不能使用积极的环视效果,因为我想自动考虑行的开头和结尾(边界)。 So double negation by !
如此双重否定了!
and [^ ]
created an effect of implicit inclusion of ^
and $
. 和[^ ]
产生了^
和$
隐式包含的效果。 Alternatively, you could use: 或者,您可以使用:
"(?<=^|\\s)\\{\\{(?=\\s|$)".r
You can read about lookarounds here . 您可以在此处阅读有关环顾四周的信息 。 Basically they match the symbol or expression as boundary; 基本上,它们将符号或表达式匹配为边界; simply saying they match stuff but don't include it in the matched string itself. 只是说它们匹配的东西,但不要将其包含在匹配的字符串本身中。
Some examples of lookarounds 环视的一些例子
(?<=z)aaa
matches "aaa" that is preceded by z
(?<=z)aaa
其前面有“AAA”匹配z
(?<!z)aaa
matches "aaa" that is not preceded by z
(?<!z)aaa
匹配不带z
“ aaa” aaa(?=z)
matches "aaa" followed by z
aaa(?=z)
匹配后跟z
“ aaa” aaa(?!z)
matches "aaa" not followed by z
aaa(?!z)
匹配“ aaa”,后跟z
PS Just to make your life easier, Scala has """
for escaping, so let's say instead of: PS为了使您的生活更轻松,Scala带有"""
来进行转义,因此,让我们代替:
"(?<!\\S)\\{\\{(?!\\S)".r
you can just: 您可以:
"""(?<!\S)\{\{(?!\S)""".r
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.