简体   繁体   English

是什么让Iteratees值得复杂?

[英]What makes Iteratees worth the complexity?

First, I understand the how of iteratees, well enough that I could probably write a simplistic and buggy implementation without referring back to any existing ones. 首先,我理解iteratees的怎么样 ,不够好,我大概可以写一个简单的和越野车实现无再参考任何现有的。

What I'd really like to know is why people seem to find them so fascinating, or under what circumstances their benefits justify their complexity. 我真正想知道的是为什么人们似乎发现它们如此迷人,或者在什么情况下它们的好处证明了它们的复杂性。 Comparing them to lazy I/O there is a very clear benefit, but that seems an awful lot like a straw man to me. 将它们与懒惰的I / O进行比较有一个非常明显的好处,但这对我来说似乎非常像一个稻草人。 I never felt comfortable about lazy I/O in the first place, and I avoid it except for the occasional hGetContents or readFile , mostly in very simple programs. 我从来没有对懒惰的I / O感到满意,除了偶尔的hGetContentsreadFile之外我都避免使用它,主要是在非常简单的程序中。

In real-world scenarios I generally use traditional I/O interfaces with control abstractions appropriate to the task. 在实际场景中,我通常使用传统的I / O接口和适合任务的控件抽象。 In that context I just don't see the benefit of iteratees, or to what task they are an appropriate control abstraction. 在那种情况下,我只是没有看到迭代者的好处,或者他们是一个适当的控制抽象的任务。 Most of the time they seem more like unnecessary complexity or even a counterproductive inversion of control. 大多数时候,他们看起来更像是不必要的复杂性,甚至是适得其反的控制倒置。

I've read a fair number of articles about them and sources that make use of them, but have not yet found a compelling example that actually made me think anything along the lines of "oh, yea, I'd have used them there too." 我已经阅读了很多关于它们的文章和使用它们的资料,但还没有找到一个令人信服的例子,实际上让我想到了“哦,是的,我也曾在那里使用它们“。 Maybe I just haven't read the right ones. 也许我只是没有读过正确的。 Or perhaps there is a yet-to-be-devised interface, simpler than any I've yet seen, that would make them feel less like a Swiss Army Chainsaw. 或许还有一个尚未设计的界面,比我见过的任何一个都简单,这会使他们感觉不像瑞士军用链锯。

Am I just suffering from not-invented-here syndrome or is my unease well-founded? 我只是患有非发明的综合症或者我的不安是否有充分根据? Or is it perhaps something else entirely? 或者它可能完全不同于其他东西?

As to why people find them so fascinating, I think because they're such a simple idea. 至于为什么人们觉得它们如此迷人,我认为因为它们是如此简单的想法。 The recent discussion on Haskell-cafe about a denotational semantics for iteratees devolved into a consensus that they're so simple they're barely worth describing. 最近关于Haskell-cafe关于迭代的指称语义的讨论转变为一种共识,即它们非常简单,几乎不值得描述。 The phrase "little more than a glorified left-fold with a pause button" sticks out to me from that thread. 短语“只有一个带有暂停按钮的美化左手折叠”从那个帖子向我伸出。 People who like Haskell tend to be fond of simple, elegant structures, so the iteratee idea is likely very appealing. 喜欢Haskell的人倾向于喜欢简单,优雅的结构,所以迭代的想法可能非常吸引人。

For me, the chief benefits of iteratees are 对我来说,迭代的主要好处是

  1. Composability. 组合性。 Not only can iteratees be composed, but enumerators can too. 不仅可以组成迭代,而且枚举器也可以。 This is very powerful. 这非常强大。
  2. Safe resource usage. 安全的资源使用。 Resources (memory and handles mostly) cannot escape their local scope. 资源(主要是内存和句柄)无法逃避其本地范围。 Compare to strict I/O, where it's easier to create space leaks by not cleaning up. 与严格的I / O相比,通过不清理更容易造成空间泄漏。
  3. Efficient. 高效。 Iteratees can be highly efficient; Iteratees可以高效; competitive with or better than both lazy I/O and strict I/O. 与懒惰I / O和严格I / O竞争或更好。

I have found that iteratees provide the greatest benefits when working with single logical data that comes from multiple sources. 我发现迭代器在处理来自多个源的单个逻辑数据时提供了最大的好处。 This is when the composability is most helpful, and resource management with strict I/O most annoying (eg nested alloca s or bracket s). 这是可组合性最有用的,并且具有严格I / O的资源管理最烦人(例如嵌套allocabracket )。

For an example, in a work-in-progress audio editor, a single logical chunk of sound data is a set of offsets into multiple audio files. 例如,在正在进行中的音频编辑器中,单个逻辑声音数据块是一组偏移到多个音频文件中的。 I can process that single chunk of sound by doing something like this (from memory, but I think this is right): 我可以通过做这样的事情处理那一小块声音(从记忆中,但我认为这是正确的):

enumSound :: MonadIO m => Sound -> Enumerator s m a
enumSound snd = foldr (>=>) enumEof . map enumFile $ sndFiles snd

This seems clear, concise, and elegant to me, much more so than the equivalent strict I/O. 这对我来说似乎清晰,简洁,优雅,远远超过了同等严格的I / O. Iteratees are also powerful enough to incorporate any processing I want to do, including writing output, so I find this very nice. Iteratees也足够强大,可以包含我想要做的任何处理,包括写输出,所以我发现这非常好。 If I used lazy I/OI could get something as elegant, but the extra care to make sure resources are consumed and GC'd would outweigh the advantages IMO. 如果我使用懒惰的I / OI可以获得优雅的东西,但要特别注意确保资源被消耗并且GC会超过IMO的优势。

I also like that you need to explicitly retain data in iteratees, which avoids the notorious mean xs = sum xs / length xs space leak. 我也喜欢你需要在迭代中明确地保留数据,这避免了臭名昭着的mean xs = sum xs / length xs space leak。

Of course, I don't use iteratees for everything. 当然,我不会将迭代用于一切。 As an alternative I really like the with* idiom, but when you have multiple resources that need to be nested that gets complex very quickly. 作为一种替代方案,我非常喜欢with* idiom,但是当你有多个需要嵌套的资源时,它会很快变得复杂。

Essentially, it's about doing IO in a functional style, correctly and efficiently . 从本质上讲,它是关于正确有效地执行功能样式的 IO。 That's all, really. 这就是全部,真的。

Correct and efficient are easy enough using quasi-imperative style with strict IO. 使用具有严格IO的准命令式风格,可以轻松实现正确和高效。 Functional style is easy with lazy IO, but it's technically cheating (using unsafeInterleaveIO under the hood) and can have issues with resource management and efficiency. 懒惰IO的功能风格很容易,但它在技术上是作弊(在引擎盖下使用unsafeInterleaveIO ),并且可能存在资源管理和效率方面的问题。

In very, very general terms, a lot of pure functional code follows a pattern of taking some data, recursively expanding it into smaller pieces, transforming the pieces in some fashion, then recombining it into a final result. 非常非常通用的术语,许多纯函数代码遵循一种获取数据的模式,递归地将其扩展为更小的片段,以某种方式转换片段,然后将其重新组合成最终结果。 The structure may be implicit (in the call graph of the program) or an explicit data structure being traversed. 该结构可以是隐式的(在程序的调用图中)或遍历的显式数据结构。

But this falls apart when IO is involved. 但是当IO涉及时,这就会崩溃。 Say your initial data is a file handle, the "recursively expand" step is reading a line from it, and you can't read the entire file into memory at once. 假设您的初始数据是文件句柄,“递归扩展”步骤正在从中读取一行,并且您无法立即将整个文件读入内存。 This forces the entire read-transform-recombine process to be done for each line before reading the next one, so instead of the clean "unfold, map, fold" structure they get mashed together into explicitly recursive monadic functions using strict IO. 这会强制在读取下一行之前对每一行执行整个读取 - 转换 - 重组过程,因此,不使用干净的“展开,映射,折叠”结构,而是使用严格的IO将它们混合成明确的递归monadic函数。

Iteratees provide an alternative structure to solve the same problem. 迭代器提供了一种替代结构来解决同样的问题。 The "transform and recombine" steps are extracted and, instead of being functions , are changed into a data structure representing the current state of the computation. 提取“变换和重新组合”步骤,而不是作为函数 ,将其改变为表示计算的当前状态的数据结构 The "recursively expand" step is given the responsibility of obtaining the data and feeding it to an (otherwise passive) iteratee. “递归扩展”步骤负责获得数据并将其提供给(否则是被动的)迭代。

What benefits does this offer? 这提供了什么好处? Among other things: 除其他事项外:

  • Because an iteratee is a passive object that performs single steps of a computation, they can be easily composed in different ways--for instance, interleaving two iteratees instead of running them sequentially. 因为iteratee是执行计算的单个步骤的被动对象,所以它们可以以不同的方式轻松组合 - 例如,交错两个迭代而不是顺序运行它们。
  • The interface between iteratees and enumerators is pure, just a stream of values being processed, so a pure function can be freely spliced in between them. 迭代器和枚举器之间的接口是纯粹的,只是正在处理的值流,因此纯函数可以在它们之间自由拼接。
  • Data sources and computations are oblivious to each other's internal workings, decoupling input and resource management from processing and output. 数据源和计算忽略了彼此的内部工作,将输入和资源管理与处理和输出分离。

The end result is that a program can have a high-level structure much closer to what a pure functional version would look like, with many of the same benefits to compositionality, while simultaneously having efficiency comparable to the more imperative, strict IO version. 最终结果是程序可以具有更接近纯功能版本的高级结构,具有许多与组合性相同的好处,同时具有与更强制性的严格IO版本相当的效率。

As for being "worth the complexity"? 至于“值得复杂”吗? Well, that's the thing--they're really not that complex, just a bit new and unfamiliar. 嗯,这就是事情 - 他们真的不是那么复杂,只是有点新奇和陌生。 The idea's been floating around for only, what, a couple years? 这个想法一直在流动,几年,几年? Give it some time for things to shake out as people use iteratee-based IO in larger projects (eg, with things like Snap), and for more examples/tutorials to appear. 当人们在较大的项目中使用基于iteratee的IO(例如,使用Snap之类的东西)以及更多的示例/教程出现时,给它一些时间来摆脱困境。 It's likely that, in hindsight, the current implementations will seem very rough around the edges. 事后看来,当前的实现可能在边缘看起来非常粗糙。


Somewhat related: You may want to read this discussion about functional-style IO . 有点相关:您可能想阅读有关功能样式IO的讨论 Iteratees aren't mentioned all that much, but the central issue is very similar. Iteratees没有被提及太多,但核心问题非常相似。 In particular this solution , which is both very elegant and goes even further than iteratees in abstracting incremental IO. 特别是这个解决方案 ,它非常优雅,甚至比抽象增量IO中的迭代更进一步

under what circumstances their benefits justify their complexity 在什么情况下他们的利益证明了他们的复杂性

Every language has strict (classical) IO, where all resources are managed by the user. 每种语言都有严格(经典)的IO,其中所有资源都由用户管理。 Haskell also provides ubiquitous lazy IO, where all resource management is delegated to the system. Haskell还提供无处不在的惰性IO,其中所有资源管理都委托给系统。

However, that can create problems, as the scope of resources is dependent on runtime demand properties. 但是,这可能会产生问题,因为资源范围取决于运行时需求属性。

Iteratees strike a third way: Iteratees第三种方式:

  • High level abstractions, like lazy IO. 高级抽象,如懒惰的IO。
  • Explicit, lexical scoping of resources, like strict IO. 显式的,词汇式的资源范围,如严格的IO。

It is justified when you have complex IO processing tasks, but very tight bounds on resource use. 当您具有复杂的IO处理任务时,这是合理的,但资源使用的界限非常紧张。 An example is a web server. 一个例子是Web服务器。

Indeed, Snap is built around iteratee IO on top of epoll. 实际上, Snap是围绕epoll上的iteratee IO构建的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM