简体   繁体   English

当字符串可以包含该字符时,该字符串以parboiled2中的字符结尾

[英]String ending with character in parboiled2, when the string can contain that character

I've come across a tricky problem writing a parboiled2 parser, which is that I need to match a portion of a line which is a string which has its end marked by a : character. 我在编写parboiled2解析器时遇到了一个棘手的问题,那就是我需要匹配一行字符串的一部分,该字符串的结尾用:字符标记。 This would be easy enough, except that the string can contain the : character. 这很容易,除了字符串可以包含 :字符。

At the moment I've got this which treats the string as a group of colon-terminated strings and concats them, but this consumes the trailing : which I don't want as the trailing : is not part of the string itself. 目前,我已经将字符串视为一组冒号结尾的字符串并将其连接起来,但这消耗了结尾:我不希望将其作为结尾:不是字符串本身的一部分。

def address = rule { capture(oneOrMore(zeroOrMore(noneOf(":")) ~ ":")) }

I feel like I should be using &(":") somewhere in here but I'm struggling to work that in while matching the interstitial : characters. 我觉得我应该在此处的某处使用&(":") ,但在与非页内广告:字符匹配时,我正在努力做到这一点。

Example successful matches (as part of a longer string): 成功匹配示例(作为较长字符串的一部分):

  • localhost: -> localhost localhost: -> localhost
  • 1::: -> 1:: 1::: -> 1::
  • ::: -> :: ::: -> ::

Mismatches: 不匹配:

  • :

Any suggestions would be welcome, even if it's "you can't do this" so I can stop racking my brains. 任何建议都将受到欢迎,即使它是“您无法做到”,也可以让我停止绞尽脑汁。


The context for this is parsing the bind setting in an HAProxy configuration file. 上下文是在HAProxy配置文件中解析bind设置。 Some examples of valid strings given the following (simplified) case classes are: 给定以下(简化的)案例类的有效字符串的一些示例是:

case class Bind(endpoint: Endpoint, params: Seq[String])
case class Endpoint(address: Option[String], port: Option[Int])
  • bind :80 -> Bind(Endpoint(None, Some(80)), Seq()) bind :80 > Bind(Endpoint(None, Some(80)), Seq())
  • bind localhost:80 -> Bind(Endpoint(Some("localhost"), Some(80)), Seq()) bind localhost:80 > Bind(Endpoint(Some("localhost"), Some(80)), Seq())
  • bind localhost -> Bind(Endpoint(Some("localhost"), None), Seq()) bind localhost -> Bind(Endpoint(Some("localhost"), None), Seq())
  • bind :80 param1 -> Bind(Endpoint(None, Some(80)), Seq("param1"))) bind :80 param1 > Bind(Endpoint(None, Some(80)), Seq("param1")))

In other words, if there is a string it needs to be terminated before the final : as that's the indicator that there is a port. 换句话说,如果有字符串,则需要在final之前将其终止:因为这表明存在端口。 The endpoint rule looks something like this: endpoint规则如下所示:

def endpoint = rule { optional(address) ~ optional(':' ~ int) ~> Endpoint }

Ultimately the matchable string for the endpoint is terminated by either a space or the end of the line, so one option would be to just capture until the space and then parse the string separately, but I was hoping to do it within the main parser. 最终,端点的可匹配字符串以空格或行的结尾终止,因此一种选择是捕获直到空格然后分别解析字符串,但是我希望在主解析器中进行解析。

I think that the following should work for your problem description: 我认为以下应适用于您的问题描述:

def noColons = rule { zeroOrMore(noneOf(":")) }
def colonWithNext = rule { ':' ~ &(noColons ~ ':') }
def address = rule { capture(oneOrMore(noColons).separatedBy(colonWithNext)) ~ ':' }

The problem with your code was the usage of the ~ combinator, since an expression like A ~ B only matches if at first A matches and then B matches, but it would mismatch at B if rule B is part of rule A. There's no backtracking involved here, the parboiled2 parser only backtracks for alternatives. 您的代码的问题是〜组合器的用法,因为类似A〜B的表达式仅在首先匹配A然后匹配A ~ B时才匹配,但是如果规则B 规则A的一部分,则它在B处将不匹配。涉及到这里,parboiled2解析器仅回溯替代项。

So, in this case you have to make sure to consume the ':' only if there's another one following it. 因此,在这种情况下,只有在后面紧跟着另一个字符时,才必须确保使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM