简体繁体 English

为什么Haskell的默认字符串实现是字符链表？

[英]Why is Haskell's default string implementation a linked list of chars?

原文 2012-12-13 17:41:41 3 3 string/ performance/ haskell/ linked-list

The fact that Haskell's default String implementation is not efficient both in terms of speed and memory is well known. Haskell的默认String实现在速度和内存方面都不高效这一事实是众所周知的。 As far as I know the [] lists in general are implemented in Haskell as singly-linked lists and for most small/simple data types (eg Int ) it doesn't seem like a very good idea, but for String it seems like total overkill. 据我所知， [] lists一般在Haskell中实现为单链表和大多数小/简单数据类型（例如Int ），它似乎不是一个好主意，但对于String它似乎总计矫枉过正。 Some of the opinions on this matter include: 关于此事的一些意见包括：

Real World Haskell 真实世界哈斯克尔

On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude. 在像这样的简单基准测试中，即使用Python等解释语言编写的程序也可以胜过使用String一个数量级的Haskell代码。

Efficient String Implementation in Haskell Haskell中的高效字符串实现

Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it's N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (...). 由于String只是[Char]，这是Char的链接列表，这意味着字符串的引用局部性较差，并且再次意味着字符串在内存中相当大，至少它是N *（21bits + Mbits）其中N是字符串的长度，M是指针的大小（...）。 Strings are much less likely to be able to be optimized to loops, etc. by the compiler. 字符串不太可能被编译器优化为循环等。

I know that Haskell has ByteString s (and Array s) in several nice flavors and that they can do the job nicely, but I would expect the default implementation to be the most efficient one. 我知道Haskell有几种不同风格的ByteString （和Array s），并且它们可以很好地完成工作，但我希望默认实现是最有效的。

TL;DR: Why is Haskell's default String implementation a singly-linked list even though it is terribly inefficient and rarely used for real world applications (except for the really simple ones)? TL; DR：为什么Haskell的默认String实现是单链表，即使它非常低效并且很少用于真实世界的应用程序（除了非常简单的应用程序）？ Are there historical reasons? 有历史原因吗？ Is it easier to implement? 实施起来更容易吗？

3 个解决方案

Why is Haskell's default String implementation a singly-linked list 为什么Haskell的默认String实现是单链表

Because singly-linked lists support: 因为单链接列表支持：

induction via pattern matching 通过模式匹配进行归纳
have useful properties, such as Monad, Functor 有一些有用的属性，比如Monad，Functor
are properly parametrically polymorphic 是正确的参数多态
are naturally lazy 自然是懒惰的

and so String as [Char] (unicode points) means a string type that fits the language goals (as of 1990), and essentially come "for free" with the list library. 所以String作为[Char] （unicode points）意味着符合语言目标的字符串类型（截至1990年），并且基本上是免费的“列表库”。

In summary, historically the language designers were interested more in well-designed core data types, than the modern problems of text processing, so we have an elegant, easy to understand, easy to teach String type, that isn't quite a unicode text chunk, and isn't a dense, packed, strict data type. 总之，历史上语言设计者对设计良好的核心数据类型感兴趣，而不是文本处理的现代问题，所以我们有一个优雅，易于理解，易于教授的String类型，这不是一个unicode文本块，并不是一个密集，打包，严格的数据类型。

Efficiency is only one axis to measure an abstraction on. 效率只是衡量抽象的一个轴。 While lists are pretty inefficient for text-y operations, they are darn convenient in that there's a lot of list operations implemented polymorphically that have useful interpretations when specialized to [Char] , so you get a lot of reuse both in the library implementation and in the user's brain. 虽然列表对于text-y操作来说效率很低，但是它们很方便，因为有很多列表操作以多态方式实现，当专门用于[Char]时有很多有用的解释，所以你在库实现和用户的大脑。

It's not clear that, were the language being designed today from scratch with our current level of experience, the same decision would be made; 目前尚不清楚，如果我们目前的经验水平是从头开始设计的，那么同样的决定也是如此; however, it's not always possible to make decisions perfectly before experience is available. 然而，在经验可用之前，并不总是能够做出完美的决策。

在这一点上，它可能是历史性的：使ByteString这样的东西变得如此高效的优化是最近的 ，而[Char]它们之前都是多年。