简体   繁体   English

Haskell(数据)构造函数构造什么?

[英]What do Haskell (data) constructors construct?

Haskell enables one to construct algebraic data types using type constructors and data constructors. Haskell使人们能够使用类型构造函数和数据构造函数构造代数数据类型。 For example, 例如,

data Circle = Circle Float Float Float

and we are told this data constructor (Circle on right) is a function that constructs a circle when give data, eg x, y, radius. 并且我们被告知该数据构造函数(右侧的圆)是一个在提供数据(例如x,y,半径)时构造圆的函数。

Circle :: Float -> Float -> Float -> Circle 

My questions are: 我的问题是:

  1. What is actually constructed by this function, specifically? 具体来说,此功能实际上构造了什么?

  2. Can we define the constructor function? 我们可以定义构造函数吗?

I've seen Smart Constructors but they just seem to be extra functions that eventually call the regular constructors. 我见过智能构造函数,但它们似乎只是最终调用常规构造函数的额外功能。

Coming from an OO background, constructors, of course, have imperative specifications. 来自OO的背景,构造函数当然有必要的规范。 In Haskell, they seem to be system-defined. 在Haskell中,它们似乎是系统定义的。

In Haskell, without considering the underlying implementation, a data constructor creates a value, essentially by fiat. 在Haskell中,数据构造函数基本上不通过法定命令来创建值,而无需考虑基础实现。 “ 'Let there be a Circle ', said the programmer, and there was a Circle .” Asking what Circle 1 2 3 creates is akin to asking what the literal 1 creates in Python or Java. “'有一个Circle ',程序员说,有一个Circle 。”问问Circle 1 2 3创建了什么,就像问文字1在Python或Java中创建了什么。

A nullary constructor is closer to what you usually think of as a literal. 无效构造函数更接近您通常认为的文字。 The Boolean type is literally defined as Boolean类型从字面上定义为

data Boolean = True | False

where True and False are data constructors, not literals defined by Haskell grammar. 其中TrueFalse是数据构造函数,而不是Haskell语法定义的文字。

The data type is also the definition of the constructor; 数据类型也是构造函数的定义。 as there isn't really anything to a value beyond the constructor name and its arguments, simply stating it is the definition. 因为除了构造函数名称及其参数之外,值实际上没有任何其他内容,只需声明它定义即可。 You create a value of type Circle by calling the data constructor Circle with 3 arguments, and that's it. 您可以通过使用3个参数调用数据构造函数Circle来创建类型Circle的值,仅此而已。

A so-called "smart constructor" is just a function that calls a data constructor, with perhaps some other logic to restrict which instances can be created. 所谓的“智能构造函数”只是一个调用数据构造函数的函数,也许还有其他逻辑来限制可以创建哪些实例。 For example, consider a simple wrapper around Integer : 例如,考虑一个围绕Integer的简单包装器:

newtype PosInteger = PosInt Integer

The constructor is PosInt ; 构造函数是PosInt ; a smart constructor might look like 一个聪明的构造函数可能看起来像

mkPosInt :: Integer -> PosInteger
mkPosInt n | n > 0 = PosInt n
           | otherwise = error "Argument must be positive"

With mkPosInt , there is no way to create a PosInteger value with a non-positive argument, because only positive arguments actually call the data constructor. 使用mkPosInt ,无法使用非正参数创建PosInteger值,因为实际上只有正参数会调用数据构造函数。 A smart constructor makes the most sense when it, and not the data constructor, is exported by a module, so that a typical user cannot create arbitrary instances (because the data constructor does not exist outside the module). 当模块导出数据时,智能构造函数而不是数据构造函数最有意义,因此典型用户无法创建任意实例(因为数据构造函数不存在于模块外部)。

Good question. 好问题。 As you know, given the definition: 如您所知,给定定义:

data Foo = A | B Int

this defines a type with a (nullary) type constructor Foo and two data constructors, A and B . 这定义了一个具有(空)类型构造函数Foo和两个数据构造函数AB

Each of these data constructors, when fully applied (to no arguments in the case of A and to a single Int argument in the case of B ) constructs a value of type Foo . 这些数据构造中,当完全应用(以任何参数中的情况下的A和到单个Int中的情况下的参数B )构建体类型的值Foo So, when I write: 所以,当我写:

a :: Foo
a = A

b :: Foo
b = B 10

the names a and b are bound to two values of type Foo . 名称ab绑定到类型为Foo两个值。

So, data constructors for type Foo construct values of type Foo . 因此,对于数据类型构造Foo类型的结构值Foo

What are values of type Foo ? Foo类型的值是什么? Well, first of all, they are different from values of any other type. 好吧,首先,它们不同于任何其他类型的值。 Second, they are wholly defined by their data constructors. 其次,它们完全由其数据构造函数定义。 There is a distinct value of type Foo , different from all other values of Foo , for each combination of a data constructor with a set of distinct arguments passed to that data constructor. 对于数据构造函数与传递给该数据构造函数的一组不同参数的每种组合,都有一个不同于Foo所有其他值的Foo类型值。 That is, two values of type Foo are identical if and only if they were constructed with the same data constructor given identical sets of arguments. 也就是说,当且仅当使用相同的数据构造函数(给定相同的参数集)构造了两个Foo类型的值时,它们才是相同的。 ("Identical" here means something different from "equality", which may not necessarily be defined for a given type Foo , but let's not get into that.) (此处的“相同”是指不同于“平等”的东西,后者不一定要为给定类型Foo定义,但我们不必赘述。)

That's also what makes data constructors different from functions in Haskell. 这也是使数据构造函数不同于Haskell中的函数的原因。 If I have a function: 如果我有一个功能:

bar :: Int -> Bool

It's possible that bar 1 and bar 2 might be exactly the same value. bar 1bar 2的值可能完全相同。 For example, if bar is defined by: 例如,如果bar通过以下方式定义:

bar n = n > 0

then it's obvious that bar 1 and bar 2 (and bar 3 ) are identically True . 那么很明显, bar 1bar 2 (以及bar 3 )是相同的True Whether the value of bar is the same for different values of its arguments will depend on the function definition. 对于不同的bar参数值, bar的值是否相同将取决于函数定义。

In contrast, if Bar is a constructor: 相反,如果Bar是构造函数:

data BarType = Bar Int

then it's never going to be the case that Bar 1 and Bar 2 are the same value. Bar 1Bar 2的值永远不会相同。 By definition, they will be different values (of type BarType ). 根据定义,它们将是不同的值( BarType类型)。

By the way, the idea that constructors are just a special kind of function is a common viewpoint. 顺便说一句,构造函数只是一种特殊的功能这一观点是一个普遍的观点。 I personally think this is inaccurate and causes confusion. 我个人认为这是不准确的,并且会引起混乱。 While it's true that constructors can often be used as if they are functions (specifically that they behave very much like functions when used in expressions), I don't think this view stands up under much scrutiny -- constructors are represented differently in the surface syntax of the language (with capitalized identifiers), can be used in contexts (like pattern matching) where functions cannot be used, are represented differently in compiled code, etc. 虽然确实可以经常将构造函数当作函数来使用(特别是当在表达式中使用它们时,它们的行为非常像函数),但我认为这种观点并不需要经过严格的审查-构造函数在表面上的表示方式有所不同语言的语法(带有大写的标识符),可以在无法使用功能的上下文中使用(如模式匹配),在编译后的代码中以不同的方式表示,等等。

So, when you ask "can we define the constructor function", the answer is "no", because there is no constructor function. 因此,当您询问“我们可以定义构造函数”时,答案是“否”,因为没有构造函数。 Instead, a constructor like A or B or Bar or Circle is what it is -- something different from a function (that sometimes behaves like a function with some special additional properties) which is capable of constructing a value of whatever type the data constructor belongs to. 取而代之的是像ABBarCircle这样的构造函数,它与函数(有时表现为具有某些特殊附加属性的函数)有所不同,该函数能够构造数据构造函数所属的任何类型的值至。

This makes Haskell constructors very different from OO constructors, but that's not surprising since Haskell values are very different from OO objects. 这使Haskell构造函数与OO构造函数有很大不同,但这并不奇怪,因为Haskell值与OO对象有很大不同。 In an OO language, you can typically provide a constructor function that does some processing in building the object, so in Python you might write: 在OO语言中,通常可以提供一个构造函数,该函数在构建对象时进行一些处理,因此在Python中,您可以编写:

class Bar:
    def __init__(self, n):
        self.value = n > 0

and then after: 然后:

bar1 = Bar(1)
bar2 = Bar(2)

we have two distinct objects bar1 and bar2 (which would satify bar1 != bar2 ) that have been configured with the same field values and are in some sense "equal". 我们有两个不同的对象bar1bar2 (将满足bar1 != bar2 ),它们已配置为相同的字段值,并且在某种意义上为“相等”。 This is sort of halfway between the situation above with bar 1 and bar 2 creating two identical values (namely True ) and the situation with Bar 1 and Bar 2 creating two distinct values that, by definition, can't possibly be the "same" in any sense. 在上述情况下, bar 1bar 2创建两个相同的值(即True )和Bar 1Bar 2创建两个不同的值(根据定义,它们不可能是“相同的”)之间存在某种中间状态无论如何

You can never have this situation with Haskell constructors. Haskell构造函数永远不会遇到这种情况。 Instead of thinking of a Haskell constructor as running some underlying function to "construct" an object which might involve some cool processing and deriving of field values, you should instead think of a Haskell constructor as a passive tag attached to a value (which may also contain zero or more other values, depending on the arity of the constructor). 与其认为Haskell构造函数运行一些基础函数来“构造”一个​​可能涉及一些很酷的处理和字段值派生的对象,还不如将Haskell构造函数视为附加到值的被动标记(这也可能是包含零个或多个其他值,具体取决于构造函数的可用性)。

So, in your example, Circle 10 20 5 doesn't "construct" an object of type Circle by running some function. 因此,在您的示例中, Circle 10 20 5不会通过运行某些函数来“构造” Circle类型的对象。 It directly creates a tagged object that, in memory, will look something like: 它直接创建一个带标签的对象,该对象在内存中将类似于:

<Circle tag>
<Float value 10>
<Float value 20>
<Float value 5>

(or you can at least pretend that's what it looks like in memory). (或者您至少可以假装这就是内存中的样子)。

The closest you can come to OO constructors in Haskell is using smart constructors. 在Haskell中,最接近OO构造函数的地方是使用智能构造函数。 As you note, eventually a smart constructor just calls a regular constructor, because that's the only way to create a value of a given type. 如您所述,最终,智能构造函数只调用常规构造函数,因为这是创建给定类型的值的唯一方法。 No matter what kind of bizarre smart constructor you build to create a Circle , the value it constructs will need to look like: 无论您创建哪种怪异的智能构造函数来创建Circle ,其构造值都将看起来像:

<Circle tag>
<some Float value>
<another Float value>
<a final Float value>

which you'll need to construct with a plain old Circle constructor call. 您需要使用普通的旧Circle构造函数调用来构造它。 There's nothing else the smart constructor could return that would still be a Circle . 智能构造函数可以返回的就是Circle That's just how Haskell works. 这就是Haskell的工作方式。

Does that help? 有帮助吗?

I'm going to answer this in a somewhat roundabout way, with an example that I hope illustrates my point, which is that Haskell decouples several distinct ideas that are coupled in OOP under the concept of a “class”. 我将以某种round回的方式来回答这个问题,并希望举例说明我的观点,那就是Haskell 在“类”概念下在OOP中耦合的几个不同的想法解耦 Understanding this will help you translate your experience from OOP into Haskell with less difficulty. 了解这一点将帮助您以较少的难度将您的经验从OOP转换为Haskell。 The example in OOP pseudocode: OOP伪代码中的示例:

class Person {

    private int id;
    private String name;

    public Person(int id, String name) {
        if (id == 0)
            throw new InvalidIdException();
        if (name == "")
            throw new InvalidNameException();

        this.name = name;
        this.id = id;
    }

    public int getId() { return this.id; }

    public String getName() { return this.name; }

    public void setName(String name) { this.name = name; }

}

In Haskell: 在Haskell中:

module Person
  ( Person
  , mkPerson
  , getId
  , getName
  , setName
  ) where

data Person = Person
  { personId :: Int
  , personName :: String
  }

mkPerson :: Int -> String -> Either String Person
mkPerson id name
  | id == 0 = Left "invalid id"
  | name == "" = Left "invalid name"
  | otherwise = Right (Person id name)

getId :: Person -> Int
getId = personId

getName :: Person -> String
getName = personName

setName :: String -> Person -> Either String Person
setName name person = mkPerson (personId person) name

Notice: 注意:

  • The Person class has been translated to a module which happens to export a data type by the same name— types (for domain representation and invariants) are decoupled from modules (for namespacing and code organisation). Person类已转换为一个模块 ,该模块恰好导出具有相同名称的数据类型- 类型 (用于域表示和不变量)与模块分离(用于命名空间和代码组织)。

  • The fields id and name , which are specified as private in the class definition, are translated to ordinary (public) fields on the data definition, since in Haskell they're made private by omitting them from the export list of the Person module— definitions and visibility are decoupled. class定义中指定为private的字段idname会转换为data定义上的普通(公共)字段,因为在Haskell中,它们通过从Person模块的导出列表( 定义)中省略而成为私有和可见性是分离的。

  • The constructor has been translated into two parts: one (the Person data constructor) that simply initialises the fields, and another ( mkPerson ) that performs validation— allocation & initialisation and validation are decoupled. 构造函数已转换为两部分:一个( Person数据构造函数)仅用于初始化字段,而另一个( mkPerson )执行验证- 分配与初始化以及验证是分离的。 Since the Person type is exported, but its constructor is not, this is the only way for clients to construct a Person —it's an “abstract data type”. 由于导出了Person类型,但没有导出其构造函数,因此这是客户端构造Person的唯一方法-这是“抽象数据类型”。

  • The public interface has been translated to functions that are exported by the Person module, and the setName function that previously mutated the Person object has become a function that returns a new instance of the Person data type that happens to share the old ID. 公共界面已被翻译成导出的函数Person模块和setName以前突变功能Person对象已经成为一个函数,返回的新实例Person出现这种情况,分享老ID的数据类型。 The OOP code has a bug : it should include a check in setName for the name != "" invariant; OOP代码有一个错误 :它应该在setName检查name != ""不变量; the Haskell code can avoid this by using the mkPerson smart constructor to ensure that all Person values are valid by construction. Haskell代码可以通过使用mkPerson智能构造函数来确保通过构造确保所有Person值均有效,从而避免了这种情况。 So state transitions and validation are also decoupled—you only need to check invariants when constructing a value, because it can't change thereafter. 因此状态转移验证也解耦了-您只需要在构造值时检查不变式,因为此后它就不能更改。

So as for your actual questions: 因此,对于您的实际问题:

  1. What is actually constructed by this function, specifically? 具体来说,此功能实际上构造了什么?

A constructor of a data type allocates space for the tag and fields of a value, sets the tag to which constructor was used to create the value, and initialises the fields to the arguments of the constructor. 数据类型的构造函数为值的标记和字段分配空间,设置构造函数用来创建值的标记,并将字段初始化为构造函数的参数。 You can't override it because the process is completely mechanical and there's no reason (in normal safe code) to do so. 您不能覆盖它,因为该过程完全是机械的,没有理由(以正常的安全代码)。 It's an internal detail of the language and runtime. 这是语言和运行时的内部细节。

  1. Can we define the constructor function? 我们可以定义构造函数吗?

No—if you want to perform additional validation to enforce invariants, you should use a “smart constructor” function which calls the lower-level data constructor. 否-如果要执行其他验证以实施不变式,则应使用“智能构造函数”函数来调用较低级别的数据构造函数。 Because Haskell values are immutable by default, values can be made correct by construction ; 由于默认情况下Haskell值是不可变的,因此可以通过构造使值正确 that is, when you don't have mutation, you don't need to enforce that all state transitions are correct, only that all states themselves are constructed correctly. 也就是说,当你没有突变,你并不需要强制执行,所有的状态转换是正确的,只有所有国家本身构建正确。 And often you can arrange your types so that smart constructors aren't even necessary. 通常,您可以安排类型,从而甚至不需要智能构造函数。

The only thing you can change about the generated data constructor “function” is making its type signature more restrictive using GADTs, to help enforce more invariants at compile-time. 关于生成的数据构造函数“函数”,您唯一可以更改的就是使用GADT对其类型签名进行更严格的限制 ,以帮助在编译时强制执行更多不变式。 And as a side note, GADTs also let you do existential quantification , which lets you carry around encapsulated/type-erased information at runtime, exactly like an OOP vtable—so this is another thing that's decoupled in Haskell but coupled in typical OOP languages. 而作为一个侧面说明,GADTs还让你做存在量词 ,它可以让你在运行时随身携带封装/类型擦除信息, 酷似一个面向对象的虚函数表,所以这是在Haskell中分离,但加上在典型的面向对象编程语言的另一件事。

Long story short (too late), you can do all the same things, you just arrange them differently, because Haskell provides the various features of OOP classes under separate orthogonal language features. 长话短说(太晚了),您可以做所有相同的事情,只是以不同的方式安排它们,因为Haskell在单独的正交语言功能下提供了OOP类的各种功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM