简体   繁体   English

F#和Haskell中向量维数的类型约束(从属类型)

[英]Type constraints on dimensionality of vectors in F# and Haskell (Dependent Types)

I'm new to F# and Haskell and am implementing a project in order to determine which language I would prefer to devote more time to. 我是F#和Haskell的新手,并且正在实施一个项目,以确定我希望花更多时间来使用哪种语言。

I have a numerous situations where I expect a given numerical type to have given dimensions based on parameters given to a top-level function (ie, at runtime). 在很多情况下,我希望给定的数值类型基于给顶层函数提供的参数(即在运行时)具有给定的尺寸。 For example, in this F# snippet, I have 例如,在此F#代码段中,我有

type DataStreamItem = LinearAlgebra.Vector<float32>

type Ball =
    {R : float32;
     X : DataStreamItem}

and I expect all instances of type DataStreamItem to have D dimensions. 我希望所有DataStreamItem类型的实例都具有D维。

My question is in the interests of algorithm development and debugging since such shape-mismatche-bugs can be a headache to pin down but should be a non-issue when the algorithm is up-and-running: 我的问题是出于算法开发和调试的利益,因为这种形状不匹配的错误可能很难确定,但是在算法运行时应该不是问题:

Is there a way, in either F# or Haskell , to constrain DataStreamItem and / or Ball to have dimensions of D ? F#Haskell中 ,是否有办法约束DataStreamItem和/或Ball具有D维? Or do I need to resort to pattern matching on every calculation? 还是在每次计算中都需要使用模式匹配?

If the latter is the case, are there any good, light-weight paradigms to catch such constraint violations as soon as they occur (and that can be removed when performance is critical)? 如果是后者,是否有任何良好,轻量级的范式可以在发生此类约束违规时立即予以捕捉(在性能至关重要的情况下可以将其删除)?

Edit: 编辑:

To clarify the sense in which D is constrained: 为了阐明D被约束的意义:

D is defined such that if you expressed the algorithm of the function main(DataStream) as a computation graph, all of the intermediate calculations would depend on the dimension of D for the execution of main(DataStream) . D的定义是,如果将main(DataStream)函数的算法表示为计算图,则所有中间计算都将依赖于D的维来执行main(DataStream) The simplest example I can think of would be a dot-product of M with DataStreamItem : the dimension of DataStream would determine the creation of dimension parameters of M 我能想到的最简单的例子是一个点的产品MDataStreamItem :的尺寸DataStream会确定的尺寸参数的创建M

Another Edit: 另一个编辑:

A week later, I find the following blog outlining precisely what I was looking for in dependant types in Haskell: 一周后,我发现以下博客恰好​​概述了我在Haskell的依赖类型中寻找的内容:

https://blog.jle.im/entry/practical-dependent-types-in-haskell-1.html https://blog.jle.im/entry/practical-dependent-types-in-haskell-1.html

And Another: This reddit contains some discussion on Dependent Types in Haskell and contains a link to the quite interesting dissertation proposal of R. Eisenberg. 另一个:Reddit包含有关Haskell中依赖类型的一些讨论,并包含R. Eisenberg相当有趣的论文建议的链接。

Neither Haskell not F# type system is rich enough to (directly) express statements of the sort " N nested instances of a recursive type T, where N is between 2 and 6 " or " a string of characters exactly 6 long ". Haskell不是F#类型的系统都不足以(直接)表达“ N个递归类型T的嵌套实例,其中N在2到6之间 ”或“ 一串正好为6个长的字符语句。 Not in those exact terms, at least. 至少没有确切的术语。

I mean, sure, you can always express such a 6-long string type as type String6 = String6 of char*char*char*char*char*char or some variant of the sort (which technically should be enough for your particular example with vectors, unless you're not telling us the whole example), but you can't say something like type String6 = s:string{s.Length=6} and, more importantly, you can't define functions of the form concat: String<n> -> String<m> -> String<n+m> , where n and m represent string lengths. 我的意思是,可以肯定,您始终可以表示这样的6位长字符串类型,如type String6 = String6 of char*char*char*char*char*char或该类型的某种变体(从技术上讲,这对于您的特定示例应该足够了)向量,除非您没有告诉我们整个示例),但是您不能说类似type String6 = s:string{s.Length=6} ,更重要的是,您不能定义concat: String<n> -> String<m> -> String<n+m>形式的函数concat: String<n> -> String<m> -> String<n+m> ,其中nm表示字符串长度。

But you're not the first person asking this question . 但是你不是第一个问这个问题的人 This research direction does exist, and is called " dependent types ", and I can express the gist of it most generally as " having higher-order, more powerful operations on types " (as opposed to just union and intersection, as we have in ML languages) - notice how in the example above I parametrize the type String with a number, not another type, and then do arithmetic on that number. 这个研究方向确实存在,并且被称为“ 从属类型 ”,我可以将其要旨最概括地表达为“ 对类型进行更高阶,更强大的运算 ”(与像联合和交集相反,正如我们在ML语言)-请注意,在上面的示例中,我如何用数字而不是其他类型对String类型进行参数化,然后对该数字进行算术运算。

The most prominent language prototypes (that I know of) in this direction are Agda , Idris , F* , and Coq (not really the full deal AFAIK). 在这一方向上最杰出的语言原型(据我所知)是AgdaIdrisF *Coq (实际上不是完整的AFAIK)。 Check them out, but beware: this is kind of the edge of tomorrow, and I wouldn't advise starting a big project based on those languages. 签出它们,但要提防:这是明天的边缘,我不建议基于这些语言启动一个大型项目。

(edit: apparently you can do certain tricks in Haskell to simulate dependent types, but it's not very convenient, and you have to enable UndecidableInstances ) (编辑:显然,您可以在Haskell中做一些技巧来模拟依赖类型,但这不是很方便,您必须启用UndecidableInstances

Alternatively , you could go with a weaker solution of doing the checks at runtime. 或者 ,您可以选择在运行时进行检查的较弱解决方案。 The general gist is: wrap your vector types in a plain wrapper, don't allow direct construction of it, but provide constructor functions instead, and make those constructor functions ensure the desired property (ie length). 一般要点是:将矢量类型包装在普通包装中,不允许直接构造,而是提供构造函数,并使这些构造函数确保所需的属性(即,长度)。 Something like: 就像是:

type Stream4 = private Stream4 of DataStreamItem
   with
      static member create (item: DataStreamItem) =
         if item.Length = 4 then Some (Stream4 item)
         else None

         // Alternatively:
         if item.Length <> 4 then failwith "Expected a 4-long vector."
         item

Here is a fuller explanation of the approach from Scott Wlaschin: constrained strings . 这是Scott Wlaschin提出的方法的更完整说明: 约束字符串

So if I understood correctly, you're actually not doing any type-level arithmetic, you just have a “length tag” that's shared in a chain of function calls. 因此,如果我正确理解,您实际上并没有执行任何类型级别的算术运算,而只是拥有一个“长度标记”,该“长度标记”在函数调用链中共享。

This has long been possible to do in Haskell; 长期以来,在Haskell可以做到这一点。 one way that I consider quite elegant is to annotate your arrays with a standard fixed-length type of the desired length: 我认为非常优雅的一种方法是使用所需长度的标准固定长度类型为数组添加注释:

newtype FixVect v s = FixVect { getFixVect :: VU.Vector s }

To ensure the correct length, you only provide (polymorphic) smart constructors that construct from the fixed-length type – perfectly safe, though the actual dimension number is nowhere mentioned! 为了确保正确的长度,您仅提供从固定长度类型构造的(多态) 智能构造函数,这是绝对安全的,尽管未提及实际的维数!

class VectorSpace v => FiniteDimensional v where
  asFixVect :: v -> FixVect v (Scalar v)

instance FiniteDimensional Float where
  asFixVect s = FixVect $ VU.singleton s
instance (FiniteDimensional a, FiniteDimensional b, Scalar a ~ Scalar b)        => FiniteDimensional (a,b) where
  asFixVect (a,b) = case (asFixVect a, asFixVect b) of
        (FixVect av, FixVect bv) -> FixVect $ av<>bv

This construction from unboxed tuples is really inefficient, however this doesn't mean you can write efficient programs with this paradigm – if the dimension always stays constant, you only need to wrap and unwrap the once and can do all the critical operations through safe yet runtime-unchecked zips, folds and LA combinations. 由未装箱的元组进行的构造实际上效率很低,但这并不意味着您可以使用这种范例编写高效的程序–如果维度始终保持不变,则只需要包装一次并解开包装,就可以通过安全而又重要的操作来完成所有关键操作运行时未经检查的拉链,折叠和LA组合。

Regardless, this approach isn't really widely used. 无论如何,这种方法并未真正广泛使用。 Perhaps the single constant dimension is in fact too limiting for most relevant operations, and if you need to unwrap to tuples often it's way too inefficient. 也许单个常量维度实际上对于大多数相关操作来说太过局限,并且如果您经常需要拆开元组,那么它效率太低。 Another approach that is taking off these days is to actually tag the vectors with type-level numbers . 目前流行的另一种方法是用类型级数字实际标记向量。 Such numbers have become available in a usable form with the introduction of data kinds in GHC-7.4. 随着GHC-7.4中数据类型的引入,这些数字已经以可用形式可用。 Up until now, they're still rather unwieldy and not fit for proper arithmetic, but the upcoming 8.0 will greatly improve many aspects of this dependently-typed programming in Haskell. 到目前为止,它们仍然很笨拙,不适合进行适当的算术运算,但是即将发布的8.0将大大改进Has​​kell中这种依赖类型编程的许多方面。

A library that offers efficient length-indexed arrays is linear . 提供有效的长度索引数组的库是linear的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM