简体   繁体   English

Lazy I/O 有什么不好?

[英]What's so bad about Lazy I/O?

I've generally heard that production code should avoid using Lazy I/O.我通常听说生产代码应该避免使用惰性 I/O。 My question is, why?我的问题是,为什么? Is it ever OK to use Lazy I/O outside of just toying around?除了玩弄之外,还可以使用 Lazy I/O 吗? And what makes the alternatives (eg enumerators) better?什么使替代品(例如枚举器)更好?

Lazy IO has the problem that releasing whatever resource you have acquired is somewhat unpredictable, as it depends on how your program consumes the data -- its "demand pattern".懒惰的 IO 有一个问题,即释放你获得的任何资源有点不可预测,因为它取决于你的程序如何使用数据——它的“需求模式”。 Once your program drops the last reference to the resource, the GC will eventually run and release that resource.一旦您的程序删除了对该资源的最后一个引用,GC 最终将运行并释放该资源。

Lazy streams are a very convenient style to program in. This is why shell pipes are so fun and popular.惰性流是一种非常方便的编程风格。这就是为什么 shell 管道如此有趣和流行的原因。

However, if resources are constrained (as in high-performance scenarios, or production environments that expect to scale to the limits of the machine) relying on the GC to clean up can be an insufficient guarantee.但是,如果资源受到限制(如在高性能场景中,或希望扩展到机器极限的生产环境中),依靠 GC 进行清理可能是一个不足的保证。

Sometimes you have to release resources eagerly, in order to improve scalability.有时您必须急切地释放资源,以提高可伸缩性。

So what are the alternatives to lazy IO that don't mean giving up on incremental processing (which in turn would consume too many resources)?那么惰性 IO 的替代方案是什么,这并不意味着放弃增量处理(这反过来又会消耗太多资源)? Well, we have foldl based processing, aka iteratees or enumerators, introduced by Oleg Kiselyov in the late 2000s , and since popularized by a number of networking-based projects.好吧,我们有基于foldl的处理,也称为迭代器或枚举器,由Oleg Kiselyov 在 2000 年代后期引入,并被许多基于网络的项目推广。

Instead of processing data as lazy streams, or in one huge batch, we instead abstract over chunk-based strict processing, with guaranteed finalization of the resource once the last chunk is read.我们不是将数据作为惰性流或一个大批量处理,而是对基于块的严格处理进行抽象,并保证在读取最后一个块后资源的最终确定。 That's the essence of iteratee-based programming, and one that offers very nice resource constraints.这就是基于迭代的编程的本质,它提供了非常好的资源约束。

The downside of iteratee-based IO is that it has a somewhat awkward programming model (roughly analogous to event-based programming, versus nice thread-based control).基于迭代的 IO 的缺点是它的编程 model 有点尴尬(大致类似于基于事件的编程,而不是基于线程的控制)。 It is definitely an advanced technique, in any programming language.在任何编程语言中,这绝对是一种先进的技术。 And for the vast majority of programming problems, lazy IO is entirely satisfactory.而对于绝大多数编程问题,懒人IO完全可以满足。 However, if you will be opening many files, or talking on many sockets, or otherwise using many simultaneous resources, an iteratee (or enumerator) approach might make sense.但是,如果您要打开许多文件,或者谈论许多 sockets,或者以其他方式同时使用许多资源,那么迭代(或枚举器)方法可能是有意义的。

Dons has provided a very good answer, but he's left out what is (for me) one of the most compelling features of iteratees: they make it easier to reason about space management because old data must be explicitly retained. Dons 提供了一个很好的答案,但他忽略了(对我而言)迭代器最引人注目的特性之一:它们使空间管理的推理变得更容易,因为必须明确保留旧数据。 Consider:考虑:

average :: [Float] -> Float
average xs = sum xs / length xs

This is a well-known space leak, because the entire list xs must be retained in memory to calculate both sum and length .这是众所周知的空间泄漏,因为整个列表xs必须保留在 memory 中才能计算sumlength It's possible to make an efficient consumer by creating a fold:通过创建折叠可以成为高效的消费者:

average2 :: [Float] -> Float
average2 xs = uncurry (/) <$> foldl (\(sumT, n) x -> (sumT+x, n+1)) (0,0) xs
-- N.B. this will build up thunks as written, use a strict pair and foldl'

But it's somewhat inconvenient to have to do this for every stream processor.但是对于每个 stream 处理器都必须这样做有点不方便。 There are some generalizations ( Conal Elliott - Beautiful Fold Zipping ), but they don't seem to have caught on.有一些概括( Conal Elliott - Beautiful Fold Zipping ),但它们似乎没有流行起来。 However, iteratees can get you a similar level of expression.但是,迭代器可以为您提供类似级别的表达。

aveIter = uncurry (/) <$> I.zip I.sum I.length

This isn't as efficient as a fold because the list is still iterated over multiple times, however it's collected in chunks so old data can be efficiently garbage collected.这不如折叠有效,因为列表仍然迭代多次,但是它是以块的形式收集的,因此可以有效地对旧数据进行垃圾收集。 In order to break that property, it's necessary to explicitly retain the entire input, such as with stream2list:为了破坏该属性,有必要显式保留整个输入,例如使用 stream2list:

badAveIter = (\xs -> sum xs / length xs) <$> I.stream2list

The state of iteratees as a programming model is a work in progress, however it's much better than even a year ago.迭代的 state 作为编程 model 是一项正在进行的工作,但它比一年前要好得多。 We're learning what combinators are useful (eg zip , breakE , enumWith ) and which are less so, with the result that built-in iteratees and combinators provide continually more expressivity.我们正在学习哪些组合器有用(例如zipbreakEenumWith ),哪些不太有用,结果内置的迭代器和组合器不断提供更多的表达能力。

That said, Dons is correct that they're an advanced technique;也就是说,Dons 是正确的,他们是一种先进的技术。 I certainly wouldn't use them for every I/O problem.我当然不会将它们用于每个 I/O 问题。

I use lazy I/O in production code all the time.我一直在生产代码中使用惰性 I/O。 It's only a problem in certain circumstances, like Don mentioned.就像唐提到的那样,这只是在某些情况下才会出现的问题。 But for just reading a few files it works fine.但是对于仅仅阅读几个文件来说它工作正常。

Update: Recently on haskell-cafe Oleg Kiseljov showed that unsafeInterleaveST (which is used for implementing lazy IO within the ST monad) is very unsafe - it breaks equational reasoning.更新:最近在 haskell-cafe Oleg Kiseljov 上表明unsafeInterleaveST (用于在 ST monad 中实现惰性 IO)非常不安全 - 它破坏了等式推理。 He shows that it allows to construct bad_ctx:: ((Bool,Bool) -> Bool) -> Bool such that他表明它允许构造bad_ctx:: ((Bool,Bool) -> Bool) -> Bool使得

> bad_ctx (\(x,y) -> x == y)
True
> bad_ctx (\(x,y) -> y == x)
False

even though == is commutative.即使==是可交换的。


Another problem with lazy IO: The actual IO operation can be deferred until it's too late, for example after the file is closed.懒惰的 IO 的另一个问题:实际的 IO 操作可以推迟到为时已晚,例如在文件关闭之后。 Quoting from Haskell Wiki - Problems with lazy IO :引用Haskell Wiki - 懒惰 IO 的问题

For example, a common beginner mistake is to close a file before one has finished reading it:例如,一个常见的初学者错误是在读完文件之前关闭文件:

 wrong = do fileData <- withFile "test.txt" ReadMode hGetContents putStr fileData

The problem is withFile closes the handle before fileData is forced.问题是 withFile 在强制 fileData 之前关闭句柄。 The correct way is to pass all the code to withFile:正确的方法是将所有代码传递给withFile:

 right = withFile "test.txt" ReadMode $ \handle -> do fileData <- hGetContents handle putStr fileData

Here, the data is consumed before withFile finishes.在这里,数据在 withFile 完成之前被使用。

This is often unexpected and an easy-to-make error.这通常是意料之外的,也是一个容易犯的错误。


See also: Three examples of problems with Lazy I/O .另请参阅: 惰性 I/O 问题的三个示例

Another problem with lazy IO that hasn't been mentioned so far is that it has surprising behaviour.迄今为止尚未提及的惰性 IO 的另一个问题是它具有令人惊讶的行为。 In a normal Haskell program, it can sometimes be difficult to predict when each part of your program is evaluated, but fortunately due to purity it really doesn't matter unless you have performance problems.在一个普通的 Haskell 程序中,有时很难预测程序的每个部分何时被评估,但幸运的是,由于纯度,除非你有性能问题,否则它真的无关紧要。 When lazy IO is introduced, the evaluation order of your code actually has an effect on its meaning, so changes that you're used to thinking of as harmless can cause you genuine problems.当引入惰性 IO 时,代码的评估顺序实际上会影响其含义,因此您习惯认为无害的更改可能会给您带来真正的问题。

As an example, here's a question about code that looks reasonable but is made more confusing by deferred IO: withFile vs. openFile例如,这里有一个关于代码的问题,看起来很合理,但由于延迟 IO 变得更加混乱: withFile vs. openFile

These problems aren't invariably fatal, but it's another thing to think about, and a sufficiently severe headache that I personally avoid lazy IO unless there's a real problem with doing all the work upfront.这些问题并非总是致命的,但这是另一件需要考虑的事情,而且我个人避免懒惰 IO 的足够严重的头痛,除非预先完成所有工作存在真正的问题。

What's so bad about lazy I/O is that you , the programmer, have to micro-manage certain resources instead of the implementation.惰性 I/O 的糟糕之处在于,程序员必须对某些资源进行微观管理,而不是对实现进行管理。 For example, which of the following is "different"?例如,以下哪项是“不同的”?

  • freeSTRef:: STRef sa -> ST s ()
  • closeIORef:: IORef a -> IO ()
  • endMVar:: MVar a -> IO ()
  • discardTVar:: TVar -> STM ()
  • hClose:: Handle -> IO ()
  • finalizeForeignPtr:: ForeignPtr a -> IO ()

...out of all these dismissive definitions, the last two - hClose and finalizeForeignPtr - actually do exist. ...在所有这些不屑一顾的定义中,最后两个 - hClosefinalizeForeignPtr - 确实存在。 As for the rest, what service they could provide in the language is much more reliably performed by the implementation!至于rest,他们可以用该语言提供的服务由实现更可靠地执行!

So if the dismissing of resources like file handles and foreign references was also left to the implementation, lazy I/O would probably be no worse than lazy evaluation.因此,如果文件句柄和外部引用等资源的消除也留给实现,那么惰性I/O 可能不会比惰性求值差。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM