如何在Haskell中处理无限的IO对象列表？

Question

I'm writing a program that reads from a list of files. 我正在编写一个从文件列表中读取的程序。 The each file either contains a link to the next file or marks that it's the end of the chain. 每个文件都包含指向下一个文件的链接或标记它是链的末尾。

Being new to Haskell, it seemed like the idiomatic way to handle this is is a lazy list of possible files to this end, I have 作为Haskell的新手，似乎处理这个的惯用方法是为此目的的可能文件的懒惰列表，我有

getFirstFile :: String -> DataFile
getNextFile :: Maybe DataFile -> Maybe DataFile

loadFiles :: String -> [Maybe DataFile]
loadFiles = iterate getNextFile . Just . getFirstFile

getFiles :: String -> [DataFile]
getFiles = map fromJust . takeWhile isJust . loadFiles

So far, so good. 到现在为止还挺好。 The only problem is that, since getFirstFile and getNextFile both need to open files, I need their results to be in the IO monad. 唯一的问题是，由于getFirstFile和getNextFile都需要打开文件，我需要将它们的结果放在IO monad中。 This gives the modified form of 这给出了修改后的形式

getFirstFile :: String -> IO DataFile
getNextFile :: Maybe DataFile -> IO (Maybe DataFile)

loadFiles :: String -> [IO Maybe DataFile]
loadFiles = iterate (getNextFile =<<) . Just . getFirstFile

getFiles :: String -> IO [DataFile]
getFiles = liftM (map fromJust . takeWhile isJust) . sequence . loadFiles

The problem with this is that, since iterate returns an infinite list, sequence becomes an infinite loop. 这个问题是，由于iterate返回一个无限列表，序列变成一个无限循环。 I'm not sure how to proceed from here. 我不知道怎么从这里开始。 Is there a lazier form of sequence that won't hit all of the list elements? 是否有一个更加懒惰的序列形式，不会命中所有列表元素？ Should I be rejiggering the map and takeWhile to be operating inside the IO monad for each list element? 我是否应该重新调整地图并在每个列表元素的IO monad中进行操作？ Or do I need to drop the whole infinite list process and write a recursive function to terminate the list manually? 或者我是否需要删除整个无限列表进程并编写递归函数来手动终止列表？

Answer 1

A step in the right direction 在正确方向迈出的一步

What puzzles me is getNextFile . 令我困惑的是getNextFile 。 Step into a simplified world with me, where we're not dealing with IO yet. 和我一起进入一个简化的世界，我们还没有处理IO。 The type is Maybe DataFile -> Maybe DataFile . 类型是Maybe DataFile -> Maybe DataFile 。 In my opinion, this should simply be DataFile -> Maybe DataFile , and I will operate under the assumption that this adjustment is possible. 在我看来，这应该只是DataFile -> Maybe DataFile ，我将在假设这种调整是可能的情况下运行。 And that looks like a good candidate for unfoldr . 这看起来像一个很好的候选人unfoldr 。 The first thing I am going to do is make my own simplified version of unfoldr, which is less general but simpler to use. 我要做的第一件事是制作我自己的展开的简化版本，这不太通用但使用起来更简单。

import Data.List

-- unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
myUnfoldr :: (a -> Maybe a) -> a -> [a]
myUnfoldr f v = v : unfoldr (fmap tuplefy . f) v
  where tuplefy x = (x,x)

Now the type f :: a -> Maybe a matches getNextFile :: DataFile -> Maybe DataFile 现在类型f :: a -> Maybe a匹配getNextFile :: DataFile -> Maybe DataFile

getFiles :: String -> [DataFile]
getFiles = myUnfoldr getNextFile . getFirstFile

Beautiful, right? 漂亮吧？ unfoldr is a lot like iterate , except once it hits Nothing , it terminates the list. unfoldr很像iterate ，除非一旦命中Nothing ，它就会终止列表。

Now, we have a problem. 现在，我们遇到了问题。 IO . IO 。 How can we do the same thing with IO thrown in there? 我们如何在那里抛出IO做同样的事情？ Don't even think about The Function Which Shall Not Be Named. 甚至不要考虑不应该命名的功能。 We need a beefed up unfoldr to handle this. 我们需要加强解决方案来解决这个问题。 Fortunately, the source for unfoldr is available to us. 幸运的是，我们可以使用展开源。

unfoldr      :: (b -> Maybe (a, b)) -> b -> [a]
unfoldr f b  =
  case f b of
   Just (a,new_b) -> a : unfoldr f new_b
   Nothing        -> []

Now what do we need? 现在我们需要什么？ A healthy dose of IO . 健康剂量的IO 。 liftM2 unfoldr almost gets us the right type, but won't quite cut it this time. liftM2 unfoldr 几乎让我们成为正确的类型，但这次不会完全削减它。

An actual solution 实际的解决方案

unfoldrM :: Monad m => (b -> m (Maybe (a, b))) -> b -> m [a]
unfoldrM f b = do
  res <- f b
  case res of
    Just (a, b') -> do
      bs <- unfoldrM f b'
      return $ a : bs
    Nothing -> return []

It is a rather straightforward transformation; 这是一个相当直接的转变; I wonder if there is some combinator that could accomplish the same. 我想知道是否有一些组合器能够实现同样的目标。

Fun fact: we can now define unfoldr fb = runIdentity $ unfoldrM (return . f) b 有趣的事实：我们现在可以定义unfoldr fb = runIdentity $ unfoldrM (return . f) b

Let's again define a simplified myUnfoldrM , we just have to sprinkle in a liftM in there: 让我们再次定义一个简化的myUnfoldrM ，我们只需要在那里的liftM中撒一点：

myUnfoldrM :: Monad m => (a -> m (Maybe a)) -> a -> m [a]
myUnfoldrM f v = (v:) `liftM` unfoldrM (liftM (fmap tuplefy) . f) v
  where tuplefy x = (x,x)

And now we're all set, just like before. 而现在，我们都像以前一样完成了。

getFirstFile :: String -> IO DataFile
getNextFile :: DataFile -> IO (Maybe DataFile)

getFiles :: String -> IO [DataFile]
getFiles str = do
  firstFile <- getFirstFile str
  myUnfoldrM getNextFile firstFile

-- alternatively, to make it look like before
getFiles' :: String -> IO [DataFile]
getFiles' = myUnfoldrM getNextFile <=< getFirstFile

By the way, I typechecked all of these with data DataFile = NoClueWhatGoesHere , and the type signatures for getFirstFile and getNextFile , with their definitions set to undefined . 顺便说一下，我使用data DataFile = NoClueWhatGoesHere以及getFirstFile和getNextFile的类型签名来data DataFile = NoClueWhatGoesHere所有这些，并将它们的定义设置为undefined 。

[edit] changed myUnfoldr and myUnfoldrM to behave more like iterate , including the initial value in the list of results. [edit]将myUnfoldr和myUnfoldrM更改为更像iterate ，包括结果列表中的初始值。

[edit] Additional insight on unfolds: [edit]关于展开的其他见解：

If you have a hard time wrapping your head around unfolds, the Collatz sequence is possibly one of the simplest examples. 如果你很难将头部展开，那么Collatz序列可能是最简单的例子之一。

collatz :: Integral a => a -> Maybe a
collatz 1 = Nothing -- the sequence ends when you hit 1
collatz n | even n    = Just $ n `div` 2
          | otherwise = Just $ 3 * n + 1

collatzSequence :: Integral a => a -> [a]
collatzSequence = myUnfoldr collatz

Remember, myUnfoldr is a simplified unfold for the cases where the "next seed" and the "current output value" are the same, as is the case for collatz. 请记住， myUnfoldr是针对“下一个种子”和“当前输出值”相同的情况的简化展开，就像collatz的情况一样。 This behavior should be easy to see given myUnfoldr 's simple definition in terms of unfoldr and tuplefy x = (x,x) . 鉴于myUnfoldr在unfoldr和tuplefy x = (x,x)方面的简单定义，这种行为应该很容易看出。

ghci> collatzSequence 9
[9,28,14,7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]

More, mostly unrelated thoughts 更多，大多是无关的想法

The rest has absolutely nothing to do with the question, but I just couldn't resist musing. 其余的与这个问题完全无关，但我无法抗拒沉思。 We can define myUnfoldr in terms of myUnfoldrM : 我们可以用myUnfoldr来定义myUnfoldrM ：

myUnfoldr f v = runIdentity $ myUnfoldrM (return . f) v

Look familiar? 看起来熟悉？ We can even abstract this pattern: 我们甚至可以抽象出这种模式：

sinkM :: ((a -> Identity b) -> a -> Identity c) -> (a -> b) -> a -> c
sinkM hof f = runIdentity . hof (return . f)

unfoldr = sinkM unfoldrM
myUnfoldr = sinkM myUnfoldrM

sinkM should work to "sink" (opposite of "lift") any function of the form sinkM应该工作“下沉”（与“提升”相反）任何形式的功能

Monad m => (a -> mb) -> a -> mc . Monad m => (a -> mb) -> a -> mc 。

since the Monad m in those functions can be unified with the Identity monad constraint of sinkM . 因为那些函数中的Monad m可以与sinkM的Identity monad约束统一。 However, I don't see anything that sinkM would actually be useful for. 但是，我没有看到任何 sinkM实际上有用的东西。

Answer 2

sequenceWhile :: Monad m => (a -> Bool) -> [m a] -> m [a]
sequenceWhile _ [] = return []
sequenceWhile p (m:ms) = do
  x <- m
  if p x
    then liftM (x:) $ sequenceWhile p ms
    else return []

Yields: 产量：

getFiles = liftM (map fromJust) . sequenceWhile isJust . loadFiles

Answer 3

As you have noticed, IO results can't be lazy, so you can't (easily) build an infinite list using IO. 正如您所注意到的，IO结果不能是懒惰的，因此您无法（轻松地）使用IO构建无限列表。 There is a way out, however, in unsafeInterleaveIO ; 然而，在unsafeInterleaveIO有一条出路; with this, you can do something like: 有了这个，你可以这样做：

ioList startFile = do
    v <- processFile startFile
    continuation <- unsafeInterleaveIO (nextFile startFile >>= ioList)
    return (v:continuation)

It's important to be careful here, though - you've just deferred the results of ioList to some unpredictable time in the future. 不过在这里要小心很重要 - 你只是将ioList的结果推迟到将来某个不可预测的时间。 It may never be run at all, in fact. 事实上，它可能永远不会被运行。 So be very careful when you're being Clever™ like this. 所以当你像这样聪明时，要非常小心。

Personally, I would just build a manual recursive function. 就个人而言，我只想构建一个手动递归函数。

Answer 4

Laziness and I/O are a tricky combination. 懒惰和I / O是一个棘手的组合。 Using unsafeInterleaveIO is one way to produce lazy lists in the IO monad (and this is the technique used by the standard getContents , readFile and friends). 使用unsafeInterleaveIO是在IO monad中生成延迟列表的一种方法（这是标准getContents ， readFile和friends使用的技术）。 However, as convenient as this is, it exposes pure code to possible I/O errors and makes makes releasing resources (such as file handles) non-deterministic. 但是，尽管如此方便，它会将纯代码暴露给可能的I / O错误，并使释放资源（例如文件句柄）成为非确定性的。 This is why most "serious" Haskell applications (especially those concerned with efficiency) nowadays use things called Enumerators and Iteratees for streaming I/O. 这就是为什么大多数“严肃的”Haskell应用程序（特别是那些关注效率的应用程序）现在使用称为枚举器和迭代器的东西来进行流I / O. One library in Hackage that implements this concept is enumerator . Hackage中的一个实现此概念的库是enumerator 。

You are probably fine with using lazy I/O in your application, but I thought I'd still give this as an example of another way to approach these kind of problems. 你可能在你的应用程序中使用惰性I / O很好，但我认为我仍然将此作为另一种解决这类问题的方法的例子。 You can find more in-depth tutorials about iteratees here and here . 您可以在此处和此处找到有关迭代的更深入的教程。

For example, your stream of DataFiles could be implemented as an Enumerator like this: 例如，您的DataFiles流可以实现为枚举器，如下所示：

import Data.Enumerator
import Control.Monad.IO.Class (liftIO)

iterFiles :: String -> Enumerator DataFile IO b
iterFiles s = first where
    first (Continue k) = do
        file <- liftIO $ getFirstFile s
        k (Chunks [file]) >>== next file
    first step = returnI step

    next prev (Continue k) = do
        file <- liftIO $ getNextFile (Just prev)
        case file of
            Nothing -> k EOF
            Just df -> k (Chunks [df]) >>== next df
    next _ step = returnI step

如何在Haskell中处理无限的IO对象列表？

问题描述

4 个解决方案

解决方案1
13 已采纳 2011-10-12 05:17:08

解决方案2
11 2011-10-11 23:55:45

解决方案3
8 2011-10-11 23:00:15

解决方案4
4 2011-10-12 12:24:41

如何在Haskell中处理无限的IO对象列表？

问题描述

4 个解决方案

解决方案1 13 已采纳 2011-10-12 05:17:08

解决方案2 11 2011-10-11 23:55:45

解决方案3 8 2011-10-11 23:00:15

解决方案4 4 2011-10-12 12:24:41

解决方案1
13 已采纳 2011-10-12 05:17:08

解决方案2
11 2011-10-11 23:55:45

解决方案3
8 2011-10-11 23:00:15

解决方案4
4 2011-10-12 12:24:41