[英]Why is Haskell's default string implementation a linked list of chars?
The fact that Haskell's default String
implementation is not efficient both in terms of speed and memory is well known. Haskell的默认
String
实现在速度和内存方面都不高效这一事实是众所周知的。 As far as I know the [] lists
in general are implemented in Haskell as singly-linked lists and for most small/simple data types (eg Int
) it doesn't seem like a very good idea, but for String
it seems like total overkill. 据我所知,
[] lists
一般在Haskell中实现为单链表和大多数小/简单数据类型(例如Int
),它似乎不是一个好主意,但对于String
它似乎总计矫枉过正。 Some of the opinions on this matter include: 关于此事的一些意见包括:
On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude.
在像这样的简单基准测试中,即使用Python等解释语言编写的程序也可以胜过使用String一个数量级的Haskell代码。
Efficient String Implementation in Haskell Haskell中的高效字符串实现
Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it's N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (...).
由于String只是[Char],这是Char的链接列表,这意味着字符串的引用局部性较差,并且再次意味着字符串在内存中相当大,至少它是N *(21bits + Mbits)其中N是字符串的长度,M是指针的大小(...)。 Strings are much less likely to be able to be optimized to loops, etc. by the compiler.
字符串不太可能被编译器优化为循环等。
I know that Haskell has ByteString
s (and Array
s) in several nice flavors and that they can do the job nicely, but I would expect the default implementation to be the most efficient one. 我知道Haskell有几种不同风格的
ByteString
(和Array
s),并且它们可以很好地完成工作,但我希望默认实现是最有效的。
TL;DR: Why is Haskell's default String
implementation a singly-linked list even though it is terribly inefficient and rarely used for real world applications (except for the really simple ones)? TL; DR:为什么Haskell的默认
String
实现是单链表,即使它非常低效并且很少用于真实世界的应用程序(除了非常简单的应用程序)? Are there historical reasons? 有历史原因吗? Is it easier to implement?
实施起来更容易吗?
Why is Haskell's default String implementation a singly-linked list
为什么Haskell的默认String实现是单链表
Because singly-linked lists support: 因为单链接列表支持:
and so String
as [Char]
(unicode points) means a string type that fits the language goals (as of 1990), and essentially come "for free" with the list library. 所以
String
作为[Char]
(unicode points)意味着符合语言目标的字符串类型(截至1990年),并且基本上是免费的“列表库”。
In summary, historically the language designers were interested more in well-designed core data types, than the modern problems of text processing, so we have an elegant, easy to understand, easy to teach String
type, that isn't quite a unicode text chunk, and isn't a dense, packed, strict data type. 总之,历史上语言设计者对设计良好的核心数据类型感兴趣,而不是文本处理的现代问题,所以我们有一个优雅,易于理解,易于教授的
String
类型,这不是一个unicode文本块,并不是一个密集,打包,严格的数据类型。
Efficiency is only one axis to measure an abstraction on. 效率只是衡量抽象的一个轴。 While lists are pretty inefficient for text-y operations, they are darn convenient in that there's a lot of list operations implemented polymorphically that have useful interpretations when specialized to
[Char]
, so you get a lot of reuse both in the library implementation and in the user's brain. 虽然列表对于text-y操作来说效率很低,但是它们很方便,因为有很多列表操作以多态方式实现,当专门用于
[Char]
时有很多有用的解释,所以你在库实现和用户的大脑。
It's not clear that, were the language being designed today from scratch with our current level of experience, the same decision would be made; 目前尚不清楚,如果我们目前的经验水平是从头开始设计的,那么同样的决定也是如此; however, it's not always possible to make decisions perfectly before experience is available.
然而,在经验可用之前,并不总是能够做出完美的决策。
在这一点上,它可能是历史性的:使ByteString
这样的东西变得如此高效的优化是最近的 ,而[Char]
它们之前都是多年。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.