简体   繁体   English

忽略解析器组合器中的任意前缀

[英]Ignoring an arbitrary prefix in a parser combinator

After getting fed up with regexes I have been trying to use scala's parser combinator libraries as a more intuitive replacement for regexes. 在厌倦了正则表达式后,我一直在尝试使用scala的解析器组合库作为正则表达式的更直观的替代品。 However, I've run into a problem when I want to search a string for a pattern and ignore things that come before it, for example if I want to check if a string contains the word "octopus" I can do something like 但是,当我想搜索字符串中的模式并忽略它之前的内容时,我遇到了一个问题,例如,如果我想检查一个字符串是否包含单词“octopus”我可以做类似的事情

val r = "octopus".r
r.findFirstIn("www.octopus.com")

Which correctly gives Some(octopus) . 哪个正确给了Some(octopus)

However, using parser combinators 但是,使用解析器组合器

import scala.util.parsing.combinator._
object OctopusParser extends RegexParsers {

  def any = regex(".".r)*
  def str = any ~> "octopus" <~ any

  def parse(s: String) = parseAll(str, s) 
}

OctopusParser.parse("www.octopus.com")

However I get an error on this 但是我得到了一个错误

scala> OctopusParser.parse("www.octopus.com")
res0: OctopusParser.ParseResult[String] = 
[1.16] failure: `octopus' expected but end of source found

www.octopus.com

Is there a good way to accomplish this? 有没有一个很好的方法来实现这一目标? From playing around, it seems that any is swallowing too much of the input. 从玩耍来看,似乎any都在吞咽太多的输入。

The problem is that your 'any' parser is greedy, so it is matching the whole line, leaving nothing for 'str' to parse. 问题是你的'any'解析器是贪婪的,所以它匹配整行,没有留下任何'str'来解析。

You might want to try something like: 你可能想尝试类似的东西:

object OctopusParser extends RegexParsers {

  def prefix = regex("""[^\.]*\.""".r) // Match on anything other than a dot and then a dot - but only the once
  def postfix = regex("""\..*""".r)* // Grab any number of remaining ".xxx" blocks
  def str = prefix ~> "octopus" <~ postfix

  def parse(s: String) = parseAll(str, s)
}

which then gives me: 然后给了我:

scala> OctopusParser.parse("www.octopus.com")
res0: OctopusParser.ParseResult[String] = [1.13] parsed: octopus

You may need to play around with 'prefix' to match the range of input you are expecting, and might want to use the '?' 你可能需要使用'prefix'来匹配你期望的输入范围,并且可能想要使用'?' lazy marker if it is being too greedy. 懒惰的标记,如果太贪心。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM