简体   繁体   English

阐明 Haskell 中的数据构造函数

[英]Clarifying Data Constructor in Haskell

In the following:在下面的:

data DataType a = Data a | Datum 

I understand that Data Constructor are value level function.我了解数据构造函数是价值级别 function。 What we do above is defining their type.我们上面所做的是定义它们的类型。 They can be function of multiple arity or const.它们可以是多个 arity 或 const 的 function。 That's fine.没关系。 I'm ok with saying Datum construct Datum .我可以说Datum构造Datum What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce.这里对我来说不是那么明确和清楚的是构造函数 function 和它产生的东西之间的区别。 Please let me know if i am getting it well:如果我做得好,请告诉我:

1 - a) Basically writing Data a , is defining both a Data Structure and its Constructor function ( as in scala or java usually the class and the constructor have the same name )? 1 - a) Basically writing Data a , is defining both a Data Structure and its Constructor function ( as in scala or java usually the class and the constructor have the same name )?

2 - b) So if i unpack and make an analogy. 2 - b)所以如果我打开包装并做一个类比。 With Data a We are both defining a Structure( don't want to use class cause class imply a type already i think, but maybe we could ) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. With Data a We are both defining a Structure( don't want to use class cause class imply a type already i think, but maybe we could ) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and后者返回该 object 结构的 object 。 Finally The type of that Structure of object is given by the Type constructor.最后 object 结构的类型由 Type 构造函数给出。 An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Object 结构在某种意义上只是围绕某种类型的一堆值的标签。 Is my understanding correct?我的理解正确吗?

3 - c) Can I formally Say: 3 - c) 我可以正式地说:

  • Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site. Nullary 的数据构造函数表示常量值 -> 返回常量值本身,其类型由定义站点的类型构造函数给出。

  • Data Constructor that takes an argument represent class of values, where class is a Tag?带参数的数据构造函数表示值的 class,其中 class 是标签? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site. -> 返回该 class 的无数个 object,其类型由定义站点的 Type 构造函数给出。

What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce这里对我来说不是那么明确和清楚的是构造函数 function 和它产生的东西之间的区别

I'm having trouble following your question, but I think you are complicating things.我无法理解您的问题,但我认为您使事情复杂化了。 I would suggest not thinking too deeply about the "constructor" terminology.我建议不要对“构造函数”术语进行深入思考。

But hopefully the following helps:但希望以下内容有所帮助:

Starting simple:开始简单:

data DataType = Data Int |数据数据类型 = 数据整数 | Datum基准

The above reads "Declare a new type named DataType , which has the possible values Datum or Data <some_number> (eg Data 42 )"上面写着“声明一个名为DataType的新类型,它有可能的DatumData <some_number> (例如Data 42 )”

So eg Datum is a value of type DataType .因此,例如DatumDataType的值。

Going back to your example with a type parameter, I want to point out what the syntax is doing:回到带有类型参数的示例,我想指出语法的作用:

     
data DataType a = Data a | Datum 
     ^        ^        ^           These things appear in type signatures (type level)
                  ^        ^       These things appear in code (value level stuff)

There's a bit of punning happening here.这里发生了一些双关语。 so in the data declaration you might see " Data Int " and this is mixing type-level and value-level stuff in a way that you wouldn't see in code.因此,在data声明中,您可能会看到“ Data Int ”,这是以您在代码中看不到的方式混合类型级别和值级别的内容。 In code you'd see eg Data 42 or Data someVal .在代码中,您会看到例如Data 42Data someVal

I hope that helps a little...我希望这会有所帮助...

Another way of writing this:另一种写法:

data DataType a = Data a | Datum

Is with generalised algebraic data type (GADT) syntax, using the GADTSyntax extension, which lets us specify the types of the constructors explicitly:使用广义代数数据类型 (GADT) 语法,使用GADTSyntax扩展,它允许我们明确指定构造函数的类型:

{-# LANGUAGE GADTSyntax #-}

data DataType a where
  Data  :: a -> DataType a
  Datum ::      DataType a

(The GADTs extension would work too; it would also allow us to specify constructors with different type arguments in the result, like DataType Int vs. DataType Bool , but that's a more advanced topic, and we don't need that functionality here.) GADTs扩展也可以工作;它还允许我们在结果中指定具有不同类型 arguments 的构造函数,例如DataType IntDataType Bool ,但这是一个更高级的主题,我们在这里不需要该功能。)

These are exactly the types you would see in GHCi if you asked for the types of the constructor functions with :type / :t :如果您使用:type / :t询问构造函数的类型,这些正是您在 GHCi 中看到的类型:

> :{
| data DataType a where
|   Data  :: a -> DataType a
|   Datum ::      DataType a
| :}

> :type Data
Data :: a -> DataType a

> :t Datum
Datum :: DataType a

With ExplicitForAll we can also specify the scope of the type variables explicitly, and make it clearer that the a in the data definition is a separate variable from the a in the constructor definitions by also giving them different names:使用ExplicitForAll ,我们还可以显式指定类型变量的 scope,并通过给它们不同的名称来更清楚地表明data定义中的a是与构造函数定义中的a不同的变量:

data DataType a where
  Data  :: forall b. b -> DataType b
  Datum :: forall c.      DataType c

Some more examples of this notation with standard prelude types:使用标准前奏类型的这种表示法的更多示例:

data Either a b where
  Left  :: forall a b. a -> Either a b
  Right :: forall a b. b -> Either a b

data Maybe a where
  Nothing :: Maybe a
  Just    :: a -> Maybe a

data Bool where
  False :: Bool
  True  :: Bool

data Ordering where
  LT, EQ, GT :: Ordering  -- Shorthand for repeated ‘:: Ordering’

I understand that Data Constructor are value level function.我了解数据构造函数是价值级别 function。 What we do above is defining their type.我们上面所做的是定义它们的类型。 They can be function of multiple arity or const.它们可以是多个 arity 或 const 的 function。 That's fine.没关系。 I'm ok with saying Datum construct Datum.我可以说 Datum 构造 Datum。 What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce.这里对我来说不是那么明确和清楚的是构造函数 function 和它产生的东西之间的区别。

Datum and Data are both “constructors” of DataType a values; DatumData都是DataType a值的“构造函数”; neither Datum nor Data is a type! DatumData都不是类型! These are just “tags” that select between the possible varieties of a DataType a value.这些只是 select 在DataType a值的可能种类之间的“标签”。

What is produced is always a value of type DataType a for a given a ;对于给定a ,生成的始终是DataType a类型的值; the constructor selects which “shape” it takes.构造函数选择它采用的“形状”。

A rough analogue of this is a union in languages like C or C++, plus an enumeration for the “tag”.一个粗略的类似物是 C 或 C++ 等语言中的union ,以及“标签”的枚举。 In pseudocode:在伪代码中:

enum Tag {
  DataTag,
  DatumTag,
}

// A single anonymous field.
struct DataFields<A> {
  A field1;  
}

// No fields.
struct DatumFields<A> {};

// A union of the possible field types.
union Fields<A> {
  DataFields<A>  data;
  DatumFields<A> datum;
}

// A pair of a tag with the fields for that tag.
struct DataType<A> {
  Tag       tag;
  Fields<A> fields;
}

The constructors are then just functions returning a value with the appropriate tag and fields.然后,构造函数只是返回具有适当标记和字段的值的函数。 Pseudocode:伪代码:

<A> DataType<A> newData(A x) {
  DataType<A> result;
  result.tag = DataTag;
  result.fields.data.field1 = x;
  return result;
}

<A> DataType<A> newDatum() {
  DataType<A> result;
  result.tag = DatumTag;
  // No fields.
  return result;
}

Unions are unsafe, since the tag and fields can get out of sync, but sum types are safe because they couple these together.联合是不安全的,因为标签和字段可能会不同步,但求和类型是安全的,因为它们将它们耦合在一起。

A pattern-match like this in Haskell: Haskell 中这样的模式匹配:

case someDT of
  Datum  -> f
  Data x -> g x

Is a combination of testing the tag and extracting the fields.测试标签和提取字段的组合。 Again, in pseudocode:同样,在伪代码中:

if (someDT.tag == DatumTag) {
  f();
} else if (someDT.tag == DataTag) {
  var x = someDT.fields.data.field1;
  g(x);
}

Again this is coupled in Haskell to ensure that you can only ever access the fields if you have checked the tag by pattern-matching.同样,这与 Haskell 结合在一起,以确保只有通过模式匹配检查了标签才能访问这些字段。

So, in answer to your questions:因此,在回答您的问题时:

1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name)? 1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name)?

Data a in your original code is not defining a data structure, in that Data is not a separate type from DataType a , it's just one of the possible tags that a DataType a value may have.原始代码中的Data a没有定义数据结构,因为Data不是与DataType a分开的类型,它只是DataType a值可能具有的标签之一。 Internally, a value of type DataType Int is one of the following:在内部, DataType Int类型的值是以下之一:

  • The tag for Data (in GHC, a pointer to an “info table” for the constructor), and a reference to a value of type Int . Data的标记(在 GHC 中,指向构造函数的“信息表”的指针),以及对Int类型值的引用。

     x = Data (1:: Int):: DataType Int +----------+----------------+ +---------+----------------+ x ---->| Data tag | pointer to Int |---->| Int tag | unboxed Int# 1 | +----------+----------------+ +---------+----------------+
  • The tag for Datum , and no other fields. Datum的标签,没有其他字段。

     y = Datum:: DataType Int +-----------+ y ----> | Datum tag | +-----------+

In a language with union s, the size of a union is the maximum of all its alternatives, since the type must support representing any of the alternatives with mutation.在具有union s 的语言中,联合的大小是其所有备选方案中的最大值,因为该类型必须支持表示任何具有突变的备选方案。 In Haskell, since values are immutable, they don't require any extra “padding” since they can't be changed.在 Haskell 中,由于值是不可变的,因此它们不需要任何额外的“填充”,因为它们无法更改。

It's a similar situation for standard data types, eg, a product or sum type:对于标准数据类型,例如 product 或 sum 类型,情况类似:

(x :: X, y :: Y) :: (X, Y)
  +---------+--------------+--------------+
  | (,) tag | pointer to X | pointer to Y |
  +---------+--------------+--------------+

Left (m :: M) :: Either M N
  +-----------+--------------+
  | Left tag  | pointer to M |
  +-----------+--------------+

Right (n :: N) :: Either M N
  +-----------+--------------+
  | Right tag | pointer to N |
  +-----------+--------------+

2 - b) So if i unpack and make an analogy. 2 - b)所以如果我打开包装并做一个类比。 With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and后者返回该 object 结构的 object 。 Finally The type of that Structure of object is given by the Type constructor.最后 object 结构的类型由 Type 构造函数给出。 An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Object 结构在某种意义上只是围绕某种类型的一堆值的标签。 Is my understanding correct?我的理解正确吗?

This is sort of correct, but again, the constructors Data and Datum aren't “data structures” by themselves.这是正确的,但同样,构造函数DataDatum本身并不是“数据结构”。 They're just the names used to introduce (construct) and eliminate (match) values of type DataType a , for some type a that is chosen by the caller of the constructors to fill in the forall它们只是用于引入(构造)和消除(匹配)类型DataType a值的名称,对于构造函数的调用者选择的某些类型a来填充forall

data DataType a = Data a | Datum data DataType a = Data a | Datum says: data DataType a = Data a | Datum说:

  • If some term e has type T , then the term Data e has type DataType T如果某个术语e的类型为T术语Data e的类型为DataType T

  • Inversely, if some value of type DataType T matches the pattern Data x , then x has type T in the scope of the match ( case branch or function equation)相反,如果DataType T类型的某个值与模式Data x匹配,则x在匹配的 scope 中具有类型Tcase分支或 function 方程)

  • The term Datum has type DataType T for any type T对于任何类型T ,术语Datum具有类型DataType T T

3 - c) Can I formally Say: 3 - c) 我可以正式地说:

Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site. Nullary 的数据构造函数表示常量值 -> 返回常量值本身,其类型由定义站点的类型构造函数给出。

Data Constructor that takes an argument represent class of values, where class is a Tag?带参数的数据构造函数表示值的 class,其中 class 是标签? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site. -> 返回该 class 的无数个 object,其类型由定义站点的 Type 构造函数给出。

Not exactly.不完全是。 A type constructor like DataType:: Type -> Type , Maybe:: Type -> Type , or Either:: Type -> Type -> Type , or []:: Type -> Type (list), or a polymorphic data type, represents an “infinite” family of concrete types ( Maybe Int , Maybe Char , Maybe (String -> String) , …) but only in the same way that id:: forall a. a -> a类型构造函数,如DataType:: Type -> TypeMaybe:: Type -> TypeEither:: Type -> Type -> Type[]:: Type -> Type (list) 或多数据类型, 表示具体类型的“无限”系列( Maybe IntMaybe CharMaybe (String -> String) ,...),但方式与id:: forall a. a -> a id:: forall a. a -> a represents an “infinite” family of functions ( id:: Int -> Int , id:: Char -> Char , id:: String -> String , …). id:: forall a. a -> a表示“无限”函数族( id:: Int -> Intid:: Char -> Charid:: String -> String ,...)。

That is, the type a here is a parameter filled in with an argument value given by the caller.也就是这里的类型a是一个参数,里面填入了调用者给的参数值。 Usually this is implicit, through type inference, but you can specify it explicitly with the TypeApplications extension:通常这是通过类型推断隐含的,但您可以使用TypeApplications扩展显式指定它:

-- Akin to: \ (a :: Type) -> \ (x :: a) -> x
id        :: forall a. a   -> a
id x = x

id @Int   ::           Int -> Int
id @Int 1 ::                  Int

Data           :: forall a. a    -> DataType a
Data @Char     ::           Char -> DataType Char
Data @Char 'x' ::                   DataType Char

The data constructors of each instantiation don't really have anything to do with each other.每个实例化的数据构造函数彼此之间没有任何关系。 There's nothing in common between the instantiations Data:: Int -> DataType Int and Data:: Char -> DataType Char , apart from the fact that they share the same tag name.实例化Data:: Int -> DataType IntData:: Char -> DataType Char之间没有任何共同之处,除了它们共享相同的标签名称。

Another way of thinking about this in Java terms is with the visitor pattern . Java 术语中的另一种思考方式是访问者模式 DataType would be represented as a function that accepts a “ DataType visitor”, and then the constructors don't correspond to separate data types, they're just the methods of the visitor which accept the fields and return some result. DataType将表示为接受“ DataType访问者”的 function,然后构造函数不对应单独的数据类型,它们只是接受字段并返回一些结果的访问者的方法 Writing the equivalent code in Java is a worthwhile exercise, but here it is in Haskell:在 Java 中编写等效代码是一个值得练习的练习,但在 Haskell 中:

{-# LANGUAGE RankNTypes #-}
-- (Allows passing polymorphic functions as arguments.)

type DataType a
  = forall r.    -- A visitor with a generic result type
  r              -- With one “method” for the ‘Datum’ case (no fields)
  -> (a -> r)    -- And one for the ‘Data’ case (one field)
  -> r           -- Returning the result

newData :: a -> DataType a
newData field = \ _visitDatum visitData -> visitData field

newDatum :: DataType a
newDatum = \ visitDatum _visitData -> visitDatum

Pattern-matching is simply running the visitor:模式匹配只是运行访问者:

matchDT :: DataType a -> b -> (a -> b) -> b
matchDT dt visitDatum visitData = dt visitDatum visitData
-- Or: matchDT dt = dt
-- Or: matchDT = id

-- case someDT of { Datum -> f; Data x -> g x }
-- f :: r
-- g :: a -> r
-- someDT :: DataType a
--        :: forall r. r -> (a -> r) -> r

someDT f (\ x -> g x)

Similarly, in Haskell, data constructors are just the ways of introducing and eliminating values of a user-defined type.同样,在 Haskell 中,数据构造函数只是引入和消除用户定义类型的值的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM