简体   繁体   English

Scala:正则表达式,可将所有内容匹配到一个特定字符

[英]Scala: Regex that matches everything up to a certain character

I want my regex to print everything before a { or {{ (not including them. 我希望我的正则表达式在{或{{之前(不包括它们。

What I have so far is: 到目前为止,我有:

class ExpressionParser extends RegexParsers {

    val regExpr = """^.*?((?=\{{2})|(?=\{)|$)""".r //not sure about the "$". Added it because test case 1 wasn't printing. see below
    def program: Parser[Any] = regExpr
}

and here are my tests: 这是我的测试:

object Test {
    def main(args: Array[String]): Unit = {

        val p = new ExpressionParser()
        val test = p.parseAll(p.program, 'tests go here') // doesn't print anything
        if(test.successful) println(test.get)

// replace 'tests go here' with each of these //分别将“ tests go here”替换为

        //"This is plain text so should always print") // this isn't printing so make checks for { optional
        //"abc {{"
        //"abc  de{ fg{{{ hi"
        //"abc } {{ {{ de{' fg{{{ hi")
    }
}

I want it to print: 我要打印:

//This is plain text so should always print
//abc 
//abc  de
//abc {

Only the first test prints. 仅第一个测试打印。 Why? 为什么?

Thanks ! 谢谢 !

Scroll down to edit to show answer after poster became more specific with answer 在海报变得更加具体之后,向下滚动进行编辑以显示答案

I've never heard of an ExpressionParser built into the Scala API, but if you want to get everything up to a certain point or between two things you can use 我从未听说过Scala API内置的ExpressionParser,但是如果您想将所有内容提高到某个点或介于两件事之间,则可以使用

(?s)(.*)

So to get everything before the letter 'a' you would use... 因此,要想得到字母“ a”之前的所有内容,您可以使用...

(?s)(.*)a

Code example: 代码示例:

  val regex2 = """(?s)(.*)a""".r

  val str1 = "somethinga"
  str1 match {
    case regex2(left) => println(left)
  }

This will print "something" without quotes 这将打印不带引号的“内容”

Edit: Since you have now updated your answer to show you are using RegexParsers, here would be a solution using that, though quite over-the-top and unnecessary if this is all you are using RegexParsers for. 编辑:由于您现在已经更新了答案以显示您正在使用RegexParsers,因此这将是一个使用该解决方案的解决方案,尽管这是相当繁琐的操作,并且如果这就是您正在使用RegexParsers的全部内容,则是不必要的。

class ExpressionParser extends RegexParsers {
  def remover: Parser[String] = """.*(?=\{)|.*""".r
}

In main: 在主要方面:

val p = new ExpressionParser()
val test = p.parseAll(p.remover, "tests go here{")// doesn't print anything
if (test.successful) println(test.get) // prints "tests go here"

Was able to figure this out by reading RegexParser documentation here: https://github.com/scala/scala-parser-combinators and https://github.com/scala/scala-parser-combinators/blob/1.1.x/docs/Getting_Started.md 通过在这里阅读RegexParser文档可以弄清楚这一点: https : //github.com/scala/scala-parser-combinatorshttps://github.com/scala/scala-parser-combinators/blob/1.1.x/文档/ Getting_Started.md

As for an explanation of this if the documentation still doesn't make sense, this is solved using "lookahead groups" which will look ahead of the previous group for the pattern matching the lookahead group and exclude it from the result. 至于如果文档仍然没有意义的解释,则使用“先行组”解决该问题,“先行组”将在前一组之前查找与先行组匹配的模式,并将其从结果中排除。

Therefore, once you hit a {, it will match the expression of everything up to the { and return that. 因此,一旦您击中{,它将匹配所有表达式直至{并返回。

Now the reason for the | 现在的原因| is it will initially try to match "everything followed by a {" but if it doesn't, there would be an issue. 是它最初会尝试匹配“后跟{的所有内容”,但如果不匹配,则会出现问题。 Therefore, we must use an "or (|)" to say if there isn't a {, just use everything. 因此,我们必须使用“或(|)”来表示是否没有{,请使用所有内容。

The reason why we cant just add a ? 为什么我们不能只添加一个? to the left part of the | 在|的左侧 at the end of the lookahead group to make the lookahead group optional is it wouldn't actually remove the lookahead group. 在lookahead组的末尾使lookahead组成为可选项,因为它实际上不会删除lookahead组。 You can try it out if you want with this regex. 如果需要此正则表达式,可以尝试一下。

.*(?=\{)?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PHP正则表达式,匹配某个字符后的所有内容并以换行符结尾 - PHP Regex that matches everything after a certain character and ends with newline 正则表达式除某些匹配项外的所有内容 - Regex for everything except certain matches 正则表达式替换所有与任何字符匹配的内容 - Regex replaces everything that matches any character 正则表达式删除某个字符后的所有内容(注释) - regex to remove everything after a certain character (comment) 正则表达式:获取所有匹配项,直到某个字符 - Regex: get all matches until certain character 排除以特定字符开头的正则表达式匹配 - Excluding regex matches that are preceded by a certain character 正则表达式-将所有内容复制到特定模式 - Regex - copy everything up to a certain patern 正则表达式中的负向提前匹配所有匹配的字符 - Negative lookahead in regex which matches everything accept a specific character 改进此正则表达式以包括它匹配的内容,直到它匹配某个字符 - Improving this regex to include what it matches until it matches a certain character 正则表达式,包括一个字符之前的所有内容,但忽略该字符的转义版本 - Regex expression that includes everything up to a character, but ignores escaped versions of that character
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM