简体繁体 English

可存储向量和未装箱向量之间的差异

[英]Differences between Storable and Unboxed Vectors

原文 2016-10-21 12:26:31 3 2 haskell/ haskell-vector

So ... I've used unboxed vectors (from the vector package) preferably now without giving it much consideration.所以......我现在最好使用未装箱的矢量（来自vector包）而没有给予太多考虑。 vector-th-unbox makes creating instances for them a breeze, so why not. vector-th-unbox使为它们创建实例变得轻而易举，为什么不呢。

Now I ran into an instance where it is not possible for me to automatically derive those instances, a data type with phantom type parameters (as in Vector (s :: Nat) a , where s encodes the length).现在我遇到了一个实例，我无法自动派生这些实例，这是一种具有幻像类型参数的数据类型（如Vector (s :: Nat) a ，其中s对长度进行编码）。

This made me think about the differences between Storable and Unboxed vectors.这让我想到了Storable和Unboxed向量之间的差异。 Things I figured out on my own:我自己想出来的事情：

Unboxed will store eg tuples as separate vectors leading to better cache locality, by not wasting bandwidth when only one of those values is needed. Unboxed将存储例如元组作为单独的向量，从而在仅需要这些值之一时不浪费带宽，从而导致更好的缓存局部性。
Storable will still be compiled to simple (and probably efficient) readArray# s that return unboxed values (as evident by reading core). Storable仍将被编译为简单（并且可能是高效的） readArray# s，它返回未装箱的值（如阅读核心所证明的那样）。
Storable allows direct pointer access which allows interoperability with foreign code. Storable允许直接指针访问，允许与外部代码的互操作性。 Unboxed doesn't. Unboxed没有。
[edit] Storable instances are actually easier to write by hand than Unbox (that is Vector and MVector ) ones. [编辑] Storable实例实际上比Unbox （即Vector和MVector ）更容易手工编写。

That alone doesn't make it evident to me why Unboxed even exists, there seem to be little benefit to it.仅凭这一点并不能让我明白为什么Unboxed甚至存在，它似乎没有什么好处。 Probably I am missing something there?可能我在那里遗漏了什么？

2 个解决方案

Cribbed from https://haskell-lang.org/library/vector摘自https://haskell-lang.org/library/vector

Storable and unboxed vectors both store their data in a byte array, avoiding pointer indirection.可存储和未装箱向量都将它们的数据存储在字节数组中，避免指针间接。 This is more memory efficient and allows better usage of caches.这样内存效率更高，并且可以更好地使用缓存。 The distinction between storable and unboxed vectors is subtle:可存储向量和未装箱向量之间的区别很微妙：

Storable vectors require data which is an instance of the Storable type class .可存储向量需要作为Storable类型类实例的Storable 。 This data is stored in malloc ed memory, which is pinned (the garbage collector can't move it around).此数据存储在malloc ed 内存中，该内存是固定的（垃圾收集器无法移动它）。 This can lead to memory fragmentation, but allows the data to be shared over the C FFI.这会导致内存碎片，但允许通过 C FFI 共享数据。
Unboxed vectors require data which is an instance of the Prim type class .未装箱的向量需要作为Prim类型类实例的数据。 This data is stored in GC-managed unpinned memory, which helps avoid memory fragmentation.此数据存储在 GC 管理的未固定内存中，这有助于避免内存碎片。 However, this data cannot be shared over the C FFI.但是，无法通过 C FFI 共享此数据。

Both the Storable and Prim typeclasses provide a way to store a value as bytes, and to load bytes into a value. Storable和Prim类型类都提供了一种将值存储为字节并将字节加载到值中的方法。 The distinction is what type of bytearray is used.区别在于使用的是什么类型的字节数组。

As usual, the only true measure of performance will be benchmarking.像往常一样，唯一真正的性能衡量标准是基准测试。 However, as a general guideline:但是，作为一般准则：

If you don't need to pass values to a C FFI, and you have a Prim instance, use unboxed vectors.如果您不需要将值传递给 C FFI，并且您有一个Prim实例，请使用未装箱的向量。
If you have a Storable instance, use a storable vector.如果您有可Storable实例，请使用可存储向量。
Otherwise, use a boxed vector.否则，使用盒装向量。

There are also other issues to consider, such as the fact that boxed vectors are instances of Functor while storable and unboxed vectors are not.还有其他问题需要考虑，例如装箱向量是Functor实例而可存储和未装箱向量不是。

Another difference is memory overhead:另一个区别是内存开销：

As per my measurements:根据我的测量：

Data.Vector.Storable.Vector Int has 64 Bytes overhead Data.Vector.Storable.Vector Int有 64 字节的开销
Data.Vector.Unboxed.Vector Int has 48 Bytes overhead. Data.Vector.Unboxed.Vector Int有 48 字节的开销。

Source:来源：