[英]How to implement “unescape” in Scala?
This is a follow-up to my previous question 这是我先前问题的跟进
Thanks to the answers I realized that the escape
function is actually a flatMap
with argument f:Char => Seq[Char]
to map escaped characters to escaping sequences (see the answers). 多亏了答案,我才意识到escape
函数实际上是一个带有参数f:Char => Seq[Char]
的flatMap
,用于将转义字符映射到转义序列(请参见答案)。
Now I wonder how to implement unescape
as a reverse operation to escape
. 现在,我想知道如何将unescape
实现为反向操作以进行escape
。 I guess tt should be a reverse to flatMap
with argument f:Seq[Char] => Char
. 我猜tt应该与参数f:Seq[Char] => Char
flatMap
相反。 Does it make sense ? 是否有意义 ? How would you suggest implement unescape
? 您如何建议实施unescape
?
I guess tt should be a reverse to flatMap with a function f:Seq[Char] => Char. 我想tt应该是带有函数f:Seq [Char] => Char的flatMap的反向版本。 Does it make sense ? 是否有意义 ?
Not really. 并不是的。 What should your inverse function f:Seq[Char] => Char
return on "abc"
? 您的反函数f:Seq[Char] => Char
在"abc"
上返回什么? It should apply to any sequence of characters and return a single character. 它应适用于任何字符序列并返回单个字符。 You could try using PartialFunction[Seq[Char], Char]
instead, but you'll run into other problems. 您可以尝试使用PartialFunction[Seq[Char], Char]
代替,但是会遇到其他问题。 Do you apply it to every subsequence of your input? 您是否将其应用于输入的每个子序列?
The more general solution would be to use foldLeft
with the accumulator type containing both the built-up part of the result and the escaping sequence, something like (untested): 更通用的解决方案是将foldLeft
与累加器类型一起使用,该累加器类型既包含结果的累积部分,又包含转义序列,例如(未经测试):
def unescape(str: String) = {
val result = str.foldLeft[(String, Option[String])](("", None)) { case ((acc, escapedAcc), c) =>
(c, escapedAcc) match {
case ('&', None) =>
(acc, Some(""))
case (_, None) =>
(acc + c, None)
case ('&', Some(_)) =>
throw new IllegalArgumentException("nested escape sequences")
case (';', Some(escapedAcc1)) =>
(acc + unescapeMap(escapedAcc1), None)
case (_, Some(escapedAcc1)) =>
(acc, Some(escapedAcc1 + c))
}
}
result match {
case (escaped, None) =>
escaped
case (_, Some(_)) =>
throw new IllegalArgumentException("unfinished escape sequence")
}
}
val unescapeMap = Map("amp" -> "&", "lt" -> "<", ...)
(It's much more efficient to use StringBuilder
s for the accumulators, but this is simpler to understand.) (对累加器使用StringBuilder
效率更高,但这更容易理解。)
But for this specific case you could just split the string on &
, then split each part except first on ;
但是对于这种特定情况,您可以只在&
上分割字符串,然后分割除第一部分外的每个部分;
, and get the parts you want this way. ,并以此方式获取所需零件。
This seems to be a follow-up to my own answer to the question whose follow-up this question is... use scala.xml.Utility.unescape
: 这似乎是我 对该问题的回答的后续问题 , 该问题的后续问题是...使用scala.xml.Utility.unescape
:
val sb = new StringBuilder
scala.xml.Utility.unescape("amp", sb)
println(sb.toString) // prints &
or if you just want to unescape once and throw away the StringBuilder
instance: 或者,如果您只想取消转义并丢弃StringBuilder
实例,则:
scala.xml.Utility.unescape("amp", new StringBuilder).toString // returns "&"
This just parses individual escapes; 这只是解析单个逃生; you'll have to build a parser of entire XML strings around it yourself—the accepted answer seems to provide that bit but fails to not reinvent the scala.xml.Utility
wheel— or use something from scala.xml
instead. 您必须自己围绕它构建整个XML字符串的解析器-公认的答案似乎提供了这一点,但是未能重新发明scala.xml.Utility
或改用scala.xml
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.