简体   繁体   中英

How to implement “unescape” in Scala?

This is a follow-up to my previous question

Thanks to the answers I realized that the escape function is actually a flatMap with argument f:Char => Seq[Char] to map escaped characters to escaping sequences (see the answers).

Now I wonder how to implement unescape as a reverse operation to escape . I guess tt should be a reverse to flatMap with argument f:Seq[Char] => Char . Does it make sense ? How would you suggest implement unescape ?

I guess tt should be a reverse to flatMap with a function f:Seq[Char] => Char. Does it make sense ?

Not really. What should your inverse function f:Seq[Char] => Char return on "abc" ? It should apply to any sequence of characters and return a single character. You could try using PartialFunction[Seq[Char], Char] instead, but you'll run into other problems. Do you apply it to every subsequence of your input?

The more general solution would be to use foldLeft with the accumulator type containing both the built-up part of the result and the escaping sequence, something like (untested):

def unescape(str: String) = {
  val result = str.foldLeft[(String, Option[String])](("", None)) { case ((acc, escapedAcc), c) => 
    (c, escapedAcc) match {
      case ('&', None) =>
        (acc, Some(""))
      case (_, None) =>
        (acc + c, None)
      case ('&', Some(_)) =>
        throw new IllegalArgumentException("nested escape sequences")
      case (';', Some(escapedAcc1)) => 
        (acc + unescapeMap(escapedAcc1), None)
      case (_,  Some(escapedAcc1)) =>
        (acc, Some(escapedAcc1 + c))
    }
  }

  result match {
    case (escaped, None) =>
      escaped
    case (_, Some(_)) => 
      throw new IllegalArgumentException("unfinished escape sequence")
  }
}

val unescapeMap = Map("amp" -> "&", "lt" -> "<", ...)

(It's much more efficient to use StringBuilder s for the accumulators, but this is simpler to understand.)

But for this specific case you could just split the string on & , then split each part except first on ; , and get the parts you want this way.

This seems to be a follow-up to my own answer to the question whose follow-up this question is... use scala.xml.Utility.unescape :

val sb = new StringBuilder
scala.xml.Utility.unescape("amp", sb)
println(sb.toString) // prints &

or if you just want to unescape once and throw away the StringBuilder instance:

scala.xml.Utility.unescape("amp", new StringBuilder).toString // returns "&"

This just parses individual escapes; you'll have to build a parser of entire XML strings around it yourself—the accepted answer seems to provide that bit but fails to not reinvent the scala.xml.Utility wheel— or use something from scala.xml instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM