简体   繁体   中英

How can I ignore non-matching preceding text when using Scala's parser combinators?

I really like parser combinators but I'm not happy with the solution I've come up with to extract data when I don't care about the text before the relevant text.

Consider this small parser to get monetary amounts:

import scala.util.parsing.combinator._

case class Amount(number: Double, currency: String)

object MyParser extends JavaTokenParsers {
  def number = floatingPointNumber ^^ (_.toDouble)
  def currency = """\w+""".r ^? ({
    case "USD" => "USD"
    case "EUR" => "EUR"
  }, "Unknown currency code: " + _)

  def amount = (number ~ currency) ^^ {
    case num ~ curr => Amount(num, curr)
  } | currency ~ number ^^ {
    case curr ~ num => Amount(num, curr)
  }

  def junk = """\S+""".r
  def amountNested: Parser[Any] = amount | junk ~> amountNested
}

As you can see, I can get Amount s back easily if I give the parser a string that begins with valid data:

scala> MyParser.parse(MyParser.amount, "101.41 EUR")
res7: MyParser.ParseResult[Amount] = [1.11] parsed: Amount(101.41,EUR)

scala> MyParser.parse(MyParser.amount, "EUR 102.13")
res8: MyParser.ParseResult[Amount] = [1.11] parsed: Amount(102.13,EUR)

However, it fails when there is non-matching text before it:

scala> MyParser.parse(MyParser.amount, "I have 101.41 EUR")
res9: MyParser.ParseResult[Amount] = 
[1.2] failure: Unknown currency code: I

I have 101.41 EUR
 ^

My solution is the amountNested parser, in which it recursively tries to find an Amount . This works but it gives a ParseResult[Any] :

scala> MyParser.parse(MyParser.amountNested, "I have 101.41 EUR")
res10: MyParser.ParseResult[Any] = [1.18] parsed: Amount(101.41,EUR)

This loss of type information (which can be 'retrieved' using pattern matching, of course) seems unfortunately because any success will contain an Amount .

Is there a way to keep searching my input ( "I have 101.41 EUR" ) until I have a match or not but without having a Parser[Any] ?

Looking at the ScalaDocs it seems like the * method on Parser might help but all I get are failures or infinite loops when I try things like:

def amount2 = ("""\S+""".r *) ~> amount

如果您将您的amountNested声明为Parser [Amount],它将对所有类型进行检查。

def amountNested: Parser[Amount] = amount | junk ~> amountNested

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM