[英]How do I handle an infinite list of IO objects in Haskell?
I'm writing a program that reads from a list of files. 我正在编写一个从文件列表中读取的程序。 The each file either contains a link to the next file or marks that it's the end of the chain.
每个文件都包含指向下一个文件的链接或标记它是链的末尾。
Being new to Haskell, it seemed like the idiomatic way to handle this is is a lazy list of possible files to this end, I have 作为Haskell的新手,似乎处理这个的惯用方法是为此目的的可能文件的懒惰列表,我有
getFirstFile :: String -> DataFile
getNextFile :: Maybe DataFile -> Maybe DataFile
loadFiles :: String -> [Maybe DataFile]
loadFiles = iterate getNextFile . Just . getFirstFile
getFiles :: String -> [DataFile]
getFiles = map fromJust . takeWhile isJust . loadFiles
So far, so good. 到现在为止还挺好。 The only problem is that, since getFirstFile and getNextFile both need to open files, I need their results to be in the IO monad.
唯一的问题是,由于getFirstFile和getNextFile都需要打开文件,我需要将它们的结果放在IO monad中。 This gives the modified form of
这给出了修改后的形式
getFirstFile :: String -> IO DataFile
getNextFile :: Maybe DataFile -> IO (Maybe DataFile)
loadFiles :: String -> [IO Maybe DataFile]
loadFiles = iterate (getNextFile =<<) . Just . getFirstFile
getFiles :: String -> IO [DataFile]
getFiles = liftM (map fromJust . takeWhile isJust) . sequence . loadFiles
The problem with this is that, since iterate returns an infinite list, sequence becomes an infinite loop. 这个问题是,由于iterate返回一个无限列表,序列变成一个无限循环。 I'm not sure how to proceed from here.
我不知道怎么从这里开始。 Is there a lazier form of sequence that won't hit all of the list elements?
是否有一个更加懒惰的序列形式,不会命中所有列表元素? Should I be rejiggering the map and takeWhile to be operating inside the IO monad for each list element?
我是否应该重新调整地图并在每个列表元素的IO monad中进行操作? Or do I need to drop the whole infinite list process and write a recursive function to terminate the list manually?
或者我是否需要删除整个无限列表进程并编写递归函数来手动终止列表?
A step in the right direction 在正确方向迈出的一步
What puzzles me is getNextFile
. 令我困惑的是
getNextFile
。 Step into a simplified world with me, where we're not dealing with IO yet. 和我一起进入一个简化的世界,我们还没有处理IO。 The type is
Maybe DataFile -> Maybe DataFile
. 类型是
Maybe DataFile -> Maybe DataFile
。 In my opinion, this should simply be DataFile -> Maybe DataFile
, and I will operate under the assumption that this adjustment is possible. 在我看来,这应该只是
DataFile -> Maybe DataFile
,我将在假设这种调整是可能的情况下运行。 And that looks like a good candidate for unfoldr
. 这看起来像一个很好的候选人
unfoldr
。 The first thing I am going to do is make my own simplified version of unfoldr, which is less general but simpler to use. 我要做的第一件事是制作我自己的展开的简化版本,这不太通用但使用起来更简单。
import Data.List
-- unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
myUnfoldr :: (a -> Maybe a) -> a -> [a]
myUnfoldr f v = v : unfoldr (fmap tuplefy . f) v
where tuplefy x = (x,x)
Now the type f :: a -> Maybe a
matches getNextFile :: DataFile -> Maybe DataFile
现在类型
f :: a -> Maybe a
匹配getNextFile :: DataFile -> Maybe DataFile
getFiles :: String -> [DataFile]
getFiles = myUnfoldr getNextFile . getFirstFile
Beautiful, right? 漂亮吧?
unfoldr
is a lot like iterate
, except once it hits Nothing
, it terminates the list. unfoldr
很像iterate
,除非一旦命中Nothing
,它就会终止列表。
Now, we have a problem. 现在,我们遇到了问题。
IO
. IO
。 How can we do the same thing with IO
thrown in there? 我们如何在那里抛出
IO
做同样的事情? Don't even think about The Function Which Shall Not Be Named. 甚至不要考虑不应该命名的功能。 We need a beefed up unfoldr to handle this.
我们需要加强解决方案来解决这个问题。 Fortunately, the source for unfoldr is available to us.
幸运的是,我们可以使用展开源 。
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
unfoldr f b =
case f b of
Just (a,new_b) -> a : unfoldr f new_b
Nothing -> []
Now what do we need? 现在我们需要什么? A healthy dose of
IO
. 健康剂量的
IO
。 liftM2 unfoldr
almost gets us the right type, but won't quite cut it this time. liftM2 unfoldr
几乎让我们成为正确的类型,但这次不会完全削减它。
An actual solution 实际的解决方案
unfoldrM :: Monad m => (b -> m (Maybe (a, b))) -> b -> m [a]
unfoldrM f b = do
res <- f b
case res of
Just (a, b') -> do
bs <- unfoldrM f b'
return $ a : bs
Nothing -> return []
It is a rather straightforward transformation; 这是一个相当直接的转变; I wonder if there is some combinator that could accomplish the same.
我想知道是否有一些组合器能够实现同样的目标。
Fun fact: we can now define unfoldr fb = runIdentity $ unfoldrM (return . f) b
有趣的事实:我们现在可以定义
unfoldr fb = runIdentity $ unfoldrM (return . f) b
Let's again define a simplified myUnfoldrM
, we just have to sprinkle in a liftM
in there: 让我们再次定义一个简化的
myUnfoldrM
,我们只需要在那里的liftM
中撒一点:
myUnfoldrM :: Monad m => (a -> m (Maybe a)) -> a -> m [a]
myUnfoldrM f v = (v:) `liftM` unfoldrM (liftM (fmap tuplefy) . f) v
where tuplefy x = (x,x)
And now we're all set, just like before. 而现在,我们都像以前一样完成了。
getFirstFile :: String -> IO DataFile
getNextFile :: DataFile -> IO (Maybe DataFile)
getFiles :: String -> IO [DataFile]
getFiles str = do
firstFile <- getFirstFile str
myUnfoldrM getNextFile firstFile
-- alternatively, to make it look like before
getFiles' :: String -> IO [DataFile]
getFiles' = myUnfoldrM getNextFile <=< getFirstFile
By the way, I typechecked all of these with data DataFile = NoClueWhatGoesHere
, and the type signatures for getFirstFile
and getNextFile
, with their definitions set to undefined
. 顺便说一下,我使用
data DataFile = NoClueWhatGoesHere
以及getFirstFile
和getNextFile
的类型签名来data DataFile = NoClueWhatGoesHere
所有这些,并将它们的定义设置为undefined
。
[edit] changed myUnfoldr
and myUnfoldrM
to behave more like iterate
, including the initial value in the list of results. [edit]将
myUnfoldr
和myUnfoldrM
更改为更像iterate
,包括结果列表中的初始值。
[edit] Additional insight on unfolds: [edit]关于展开的其他见解:
If you have a hard time wrapping your head around unfolds, the Collatz sequence is possibly one of the simplest examples. 如果你很难将头部展开,那么Collatz序列可能是最简单的例子之一。
collatz :: Integral a => a -> Maybe a
collatz 1 = Nothing -- the sequence ends when you hit 1
collatz n | even n = Just $ n `div` 2
| otherwise = Just $ 3 * n + 1
collatzSequence :: Integral a => a -> [a]
collatzSequence = myUnfoldr collatz
Remember, myUnfoldr
is a simplified unfold for the cases where the "next seed" and the "current output value" are the same, as is the case for collatz. 请记住,
myUnfoldr
是针对“下一个种子”和“当前输出值”相同的情况的简化展开,就像collatz的情况一样。 This behavior should be easy to see given myUnfoldr
's simple definition in terms of unfoldr
and tuplefy x = (x,x)
. 鉴于
myUnfoldr
在unfoldr
和tuplefy x = (x,x)
方面的简单定义,这种行为应该很容易看出。
ghci> collatzSequence 9
[9,28,14,7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]
More, mostly unrelated thoughts 更多,大多是无关的想法
The rest has absolutely nothing to do with the question, but I just couldn't resist musing. 其余的与这个问题完全无关,但我无法抗拒沉思。 We can define
myUnfoldr
in terms of myUnfoldrM
: 我们可以用
myUnfoldr
来定义myUnfoldrM
:
myUnfoldr f v = runIdentity $ myUnfoldrM (return . f) v
Look familiar? 看起来熟悉? We can even abstract this pattern:
我们甚至可以抽象出这种模式:
sinkM :: ((a -> Identity b) -> a -> Identity c) -> (a -> b) -> a -> c
sinkM hof f = runIdentity . hof (return . f)
unfoldr = sinkM unfoldrM
myUnfoldr = sinkM myUnfoldrM
sinkM
should work to "sink" (opposite of "lift") any function of the form sinkM
应该工作“下沉”(与“提升”相反)任何形式的功能
Monad m => (a -> mb) -> a -> mc
. Monad m => (a -> mb) -> a -> mc
。
since the Monad m
in those functions can be unified with the Identity
monad constraint of sinkM
. 因为那些函数中的
Monad m
可以与sinkM
的Identity
monad约束统一。 However, I don't see anything that sinkM
would actually be useful for. 但是, 我没有看到任何
sinkM
实际上有用的东西。
sequenceWhile :: Monad m => (a -> Bool) -> [m a] -> m [a]
sequenceWhile _ [] = return []
sequenceWhile p (m:ms) = do
x <- m
if p x
then liftM (x:) $ sequenceWhile p ms
else return []
Yields: 产量:
getFiles = liftM (map fromJust) . sequenceWhile isJust . loadFiles
As you have noticed, IO results can't be lazy, so you can't (easily) build an infinite list using IO. 正如您所注意到的,IO结果不能是懒惰的,因此您无法(轻松地)使用IO构建无限列表。 There is a way out, however, in
unsafeInterleaveIO
; 然而,在
unsafeInterleaveIO
有一条出路; with this, you can do something like: 有了这个,你可以这样做:
ioList startFile = do
v <- processFile startFile
continuation <- unsafeInterleaveIO (nextFile startFile >>= ioList)
return (v:continuation)
It's important to be careful here, though - you've just deferred the results of ioList
to some unpredictable time in the future. 不过在这里要小心很重要 - 你只是将
ioList
的结果推迟到将来某个不可预测的时间。 It may never be run at all, in fact. 事实上,它可能永远不会被运行。 So be very careful when you're being Clever™ like this.
所以当你像这样聪明时,要非常小心。
Personally, I would just build a manual recursive function. 就个人而言,我只想构建一个手动递归函数。
Laziness and I/O are a tricky combination. 懒惰和I / O是一个棘手的组合。 Using
unsafeInterleaveIO
is one way to produce lazy lists in the IO monad (and this is the technique used by the standard getContents
, readFile
and friends). 使用
unsafeInterleaveIO
是在IO monad中生成延迟列表的一种方法(这是标准getContents
, readFile
和friends使用的技术)。 However, as convenient as this is, it exposes pure code to possible I/O errors and makes makes releasing resources (such as file handles) non-deterministic. 但是,尽管如此方便,它会将纯代码暴露给可能的I / O错误,并使释放资源(例如文件句柄)成为非确定性的。 This is why most "serious" Haskell applications (especially those concerned with efficiency) nowadays use things called Enumerators and Iteratees for streaming I/O.
这就是为什么大多数“严肃的”Haskell应用程序(特别是那些关注效率的应用程序)现在使用称为枚举器和迭代器的东西来进行流I / O. One library in Hackage that implements this concept is
enumerator
. Hackage中的一个实现此概念的库是
enumerator
。
You are probably fine with using lazy I/O in your application, but I thought I'd still give this as an example of another way to approach these kind of problems. 你可能在你的应用程序中使用惰性I / O很好,但我认为我仍然将此作为另一种解决这类问题的方法的例子。 You can find more in-depth tutorials about iteratees here and here .
您可以在此处和此处找到有关迭代的更深入的教程。
For example, your stream of DataFiles could be implemented as an Enumerator like this: 例如,您的DataFiles流可以实现为枚举器,如下所示:
import Data.Enumerator
import Control.Monad.IO.Class (liftIO)
iterFiles :: String -> Enumerator DataFile IO b
iterFiles s = first where
first (Continue k) = do
file <- liftIO $ getFirstFile s
k (Chunks [file]) >>== next file
first step = returnI step
next prev (Continue k) = do
file <- liftIO $ getNextFile (Just prev)
case file of
Nothing -> k EOF
Just df -> k (Chunks [df]) >>== next df
next _ step = returnI step
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.