如何限制Regex和Parser組合器中的nestead標記？

Question

我想實現一個簡單的類似Wiki的標記解析器，作為使用Scala解析器組合器的練習。

我想逐點解決這個問題，所以這是我想在第一個版本中實現的：一個簡單的內聯文字標記。

例如，如果輸入字符串是：

This is a sytax test ``code here`` . Hello ``World``

輸出字符串應為：

This is a sytax test <code>code here</code> . Hello <code>World</code>

我嘗試使用RegexParsers來解決這個RegexParsers ，這就是我現在所做的：

import scala.util.parsing.combinator._
import scala.util.parsing.input._

object TestParser extends RegexParsers
{   
    override val skipWhitespace = false

    def toHTML(s: String) = "<code>" + s.drop(2).dropRight(2) + "</code>"

    val words = """(.)""".r
    val literal = """\B``(.)*``\B""".r ^^ toHTML

    val markup = (literal | words)*

    def run(s: String) = parseAll(markup, s) match {
        case Success(xs, next) => xs.mkString
        case _ => "fail"
    }
}

println (TestParser.run("This is a sytax test ``code here`` . Hello ``World``"))

在此代碼中，只包含一個<code>標記的更簡單的輸入正常工作，例如：

This is a sytax test ``code here``.

成為

This is a sytax test <code>code here</code>.

但是當我用上面的例子運行它時，它會產生

This is a sytax test <code>code here`` . Hello ``World</code>

我想這是因為我使用的正則表達式：

"""\B``(.)*``\B""".r

允許``對``任何字符。

我想知道我應該限制沒有嵌套``並解決這個問題？

Answer 1

這里有一些關於非貪婪匹配的文檔：

http://www.exampledepot.com/egs/java.util.regex/Greedy.html

基本上它是從第一個`開始，並盡可能地得到一個匹配，匹配世界末尾的``。

通過放一個？ 在你的*之后，你告訴它做最短的比賽，而不是最長的比賽。

另一種選擇是使用[^`] *（除了`之外的任何東西），這將迫使它提前停止。

Answer 2

經過一些試驗和錯誤后，我發現以下正則表達式似乎有效：

"""``(.)*?``"""

Answer 3

我對正則表達式解析器了解不多，但您可以使用簡單的1-liner：

def addTags(s: String) =
  """(``.*?``)""".r replaceAllIn (
                    s, m => "<code>" + m.group(0).replace("``", "") + "</code>")

測試：

scala> addTags("This is a sytax test ``code here`` . Hello ``World``")
res0: String = This is a sytax test <code>code here</code> . Hello <code>World</code>

如何限制Regex和Parser組合器中的nestead標記？

問題描述

3 個解決方案

解決方案1
2 已采納 2011-12-04 03:53:47

解決方案2
0 2011-12-04 02:49:47

解決方案3
0 2011-12-04 04:22:30

如何限制Regex和Parser組合器中的nestead標記？

問題描述

3 個解決方案

解決方案1 2 已采納 2011-12-04 03:53:47

解決方案2 0 2011-12-04 02:49:47

解決方案3 0 2011-12-04 04:22:30

解決方案1
2 已采納 2011-12-04 03:53:47

解決方案2
0 2011-12-04 02:49:47

解決方案3
0 2011-12-04 04:22:30