简体   繁体   English

将类型与数据构造函数关联的ADT编码有什么问题? (如Scala。)

[英]What are the problems with an ADT encoding that associates types with data constructors? (Such as Scala.)

In Scala, algebraic data types are encoded as sealed one-level type hierarchies. 在Scala中,代数数据类型被编码为sealed单级类型层次结构。 Example: 例:

-- Haskell
data Positioning a = Append
                   | AppendIf (a -> Bool)
                   | Explicit ([a] -> [a]) 
// Scala
sealed trait Positioning[A]
case object Append extends Positioning[Nothing]
case class AppendIf[A](condition: A => Boolean) extends Positioning[A]
case class Explicit[A](f: Seq[A] => Seq[A]) extends Positioning[A]

With case class es and case object s, Scala generates a bunch of things like equals , hashCode , unapply (used by pattern matching) etc that brings us many of the key properties and features of traditional ADTs. 使用case class es和case object ,Scala会生成一堆东西,如equalshashCodeunapply (由模式匹配使用)等,它们为我们带来了许多传统ADT的关键属性和特性。

There is one key difference though – In Scala, "data constructors" have their own types . 但是有一个关键的区别 - 在Scala中,“数据构造函数”有自己的类型 Compare the following two for example (Copied from the respective REPLs). 比较以下两个例子(从相应的REPL复制)。

// Scala

scala> :t Append
Append.type

scala> :t AppendIf[Int](Function const true)
AppendIf[Int]

-- Haskell

haskell> :t Append
Append :: Positioning a

haskell> :t AppendIf (const True)
AppendIf (const True) :: Positioning a

I have always considered the Scala variation to be on the advantageous side. 我一直认为Scala的变化是有利的。

After all, there is no loss of type information . 毕竟, 不会丢失类型信息 AppendIf[Int] for instance is a subtype of Positioning[Int] . AppendIf[Int]Positioning[Int]的子类型。

scala> val subtypeProof = implicitly[AppendIf[Int] <:< Positioning[Int]]
subtypeProof: <:<[AppendIf[Int],Positioning[Int]] = <function1>

In fact, you get an additional compile time invariant about the value . 实际上, 您获得了有关该值的额外编译时间不变量 (Could we call this a limited version of dependent typing?) (我们可以将此称为依赖类型的限制版本吗?)

This can be put to good use – Once you know what data constructor was used to create a value, the corresponding type can be propagated through rest of the flow to add more type safety. 这可以很好地使用 - 一旦你知道使用了什么数据构造函数来创建一个值,相应的类型就可以通过流的其余部分传播,以增加更多的类型安全性。 For example, Play JSON, which uses this Scala encoding, will only allow you to extract fields from JsObject , not from any arbitrary JsValue . 例如,使用此Scala编码的Play JSON只允许您从JsObject提取fields ,而不是从任意JsValue提取fields

scala> import play.api.libs.json._
import play.api.libs.json._

scala> val obj = Json.obj("key" -> 3)
obj: play.api.libs.json.JsObject = {"key":3}

scala> obj.fields
res0: Seq[(String, play.api.libs.json.JsValue)] = ArrayBuffer((key,3))

scala> val arr = Json.arr(3, 4)
arr: play.api.libs.json.JsArray = [3,4]

scala> arr.fields
<console>:15: error: value fields is not a member of play.api.libs.json.JsArray
              arr.fields
                  ^

scala> val jsons = Set(obj, arr)
jsons: scala.collection.immutable.Set[Product with Serializable with play.api.libs.json.JsValue] = Set({"key":3}, [3,4])

In Haskell, fields would probably have type JsValue -> Set (String, JsValue) . 在Haskell中, fields可能具有类型JsValue -> Set (String, JsValue) Which means it will fail at runtime for a JsArray etc. This problem also manifests in the form of well known partial record accessors. 这意味着它将在运行时为JsArray等失败。这个问题也以众所周知的部分记录访问器的形式出现。

The view that Scala's treatment of data constructors is wrong has been expressed numerous times – on Twitter, mailing lists, IRC, SO etc. Unfortunately I don't have links to any of those, except for a couple - this answer by Travis Brown, and Argonaut , a purely functional JSON library for Scala. Scala对数据构造函数的处理错误的观点已多次表达 - 在推特,邮件列表,IRC,SO等等。不幸的是我没有任何链接,除了一对 - Travis Brown的回答 ,和Argonaut ,一个用于Scala的纯函数JSON库。

Argonaut consciously takes the Haskell approach (by private ing case classes, and providing data constructors manually). Argonaut 有意识地采用Haskell方法(通过private案例类,并手动提供数据构造函数)。 You can see that the problem I mentioned with Haskell encoding exists with Argonaut as well. 你可以看到我用Haskell编码提到的问题也存在于Argonaut中。 (Except it uses Option to indicate partiality.) (除非它使用Option来表示偏倚。)

scala> import argonaut._, Argonaut._
import argonaut._
import Argonaut._

scala> val obj = Json.obj("k" := 3)
obj: argonaut.Json = {"k":3}

scala> obj.obj.map(_.toList)
res6: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = Some(List((k,3)))

scala> val arr = Json.array(jNumber(3), jNumber(4))
arr: argonaut.Json = [3,4]

scala> arr.obj.map(_.toList)
res7: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = None

I have been pondering this for quite some time, but still do not understand what makes Scala's encoding wrong. 我一直在思考这个问题,但仍然不明白是什么让Scala的编码错误。 Sure it hampers type inference at times, but that does not seem like a strong enough reason to decree it wrong. 当然它有时会妨碍类型推断,但这似乎不足以说明它是错误的。 What am I missing? 我错过了什么?

To the best of my knowledge, there are two reasons why Scala's idiomatic encoding of case classes can be bad: type inference, and type specificity. 据我所知,Scala的案例类的惯用编码可能有两个原因:类型推断和类型特异性。 The former is a matter of syntactic convenience, while the latter is a matter of increased scope of reasoning. 前者是语法上的便利问题,而后者则是推理范围扩大的问题。

The subtyping issue is relatively easy to illustrate: 子类型问题相对容易说明:

val x = Some(42)

The type of x turns out to be Some[Int] , which is probably not what you wanted. x的类型原来是Some[Int] ,这可能不是你想要的。 You can generate similar issues in other, more problematic areas: 您可以在其他更有问题的领域生成类似问题:

sealed trait ADT
case class Case1(x: Int) extends ADT
case class Case2(x: String) extends ADT

val xs = List(Case1(42), Case1(12))

The type of xs is List[Case1] . xs的类型是List[Case1] This is basically guaranteed to be not what you want. 这基本上保证不是你想要的。 In order to get around this issue, containers like List need to be covariant in their type parameter. 为了解决这个问题,像List这样的容器需要在它们的类型参数中是协变的。 Unfortunately, covariance introduces a whole bucket of issues, and in fact degrades the soundness of certain constructs (eg Scalaz compromises on its Monad type and several monad transformers by allowing covariant containers, despite the fact that it is unsound to do so). 不幸的是,协方差引入了一大堆问题,实际上降低了某些结构的健全性(例如,Scalaz通过允许协变容器对其Monad类型和几个monad变换器进行妥协,尽管事实上这样做是不合理的)。

So, encoding ADTs in this fashion has a somewhat viral effect on your code. 因此,以这种方式编码ADT会对代码产生一定程度的病毒影响。 Not only do you need to deal with subtyping in the ADT itself, but every container you ever write needs to take into account the fact that you're landing on subtypes of your ADT at inopportune moments. 您不仅需要处理ADT本身的子类型,而且您编写的每个容器都需要考虑到您在不合适的时刻登陆ADT的子类型这一事实。

The second reason not to encode your ADTs using public case classes is to avoid cluttering up your type space with "non-types". 不使用公共案例类对ADT进行编码的第二个原因是避免使用“非类型”来混淆类型空间。 From a certain perspective, ADT cases are not really types: they are data. 从某个角度来看,ADT案例并不是真正的类型:它们是数据。 If you reason about ADTs in this fashion (which is not wrong!), then having first-class types for each of your ADT cases increases the set of things you need to carry in your mind to reason about your code. 如果你以这种方式推理ADT(这没有错!),那么为每个ADT案例提供一流的类型会增加你需要携带的一些东西来推理你的代码。

For example, consider the ADT algebra from above. 例如,考虑上面的ADT代数。 If you want to reason about code which uses this ADT, you need to be constantly thinking about "well, what if this type is Case1 ?" 如果你想推理使用这个ADT的代码,你需要不断思考“好吧,如果这种类型是Case1怎么办?” That just not a question anyone really needs to ask, since Case1 is data. 这不是任何人真正需要问的问题,因为Case1是数据。 It's a tag for a particular coproduct case. 它是特定副产品案例的标签。 That's all. 就这样。

Personally, I don't care much about any of the above. 就个人而言,我并不关心以上任何一点。 I mean, the unsoundness issues with covariance are real, but I generally just prefer to make my containers invariant and instruct my users to "suck it up and annotate your types". 我的意思是,协方差的不健全问题是真实的,但我通常只是希望使我的容器不变并指示我的用户“吮吸它并注释你的类型”。 It's inconvenient and it's dumb, but I find it preferable to the alternative, which is a lot of boilerplate folds and "lower-case" data constructors. 它很不方便而且很笨,但我觉得它更接近替代方案,它是很多样板折叠和“小写”数据构造器。

As a wildcard, a third potential disadvantage to this sort of type specificity is it encourages (or rather, allows) a more "object-oriented" style where you put case-specific functions on the individual ADT types. 作为通配符,这种类型特异性的第三个潜在缺点是它鼓励(或者更确切地说,允许)更“面向对象”的样式,其中您将特定于案例的函数放在各个ADT类型上。 I think there is very little question that mixing your metaphors (case classes vs subtype polymorphism) in this way is a recipe for bad. 我认为以这种方式混合你的隐喻(案例类和子类型多态)是一个很糟糕的问题。 However, whether or not this outcome is the fault of typed cases is sort of an open question. 但是,这种结果是否是打字案件的错误是一个悬而未决的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM