简体   繁体   English

hGetContents太懒了

[英]hGetContents being too lazy

I have the following snippet of code, which I pass to withFile : 我有以下代码片段,我将其传递给withFile

text <- hGetContents hand 
let code = parseCode text
return code

Here hand is a valid file handle, opened with ReadMode and parseCode is my own function that reads the input and returns a Maybe. 这里一方面是有效的文件句柄,以打开ReadModeparseCode是我自己的函数读取输入,并返回一个可能。 As it is, the function fails and returns Nothing. 实际上,函数失败并返回Nothing。 If, instead I write: 如果,而是我写:

text <- hGetContents hand 
putStrLn text
let code = parseCode text
return code

I get a Just, as I should. 我得到了一个Just,就像我应该的那样。

If I do openFile and hClose myself, I have the same problem. 如果我自己做openFilehClose ,我也有同样的问题。 Why is this happening? 为什么会这样? How can I cleanly solve it? 我怎样才能干净利落地解决它?

Thanks 谢谢

hGetContents isn't too lazy, it just needs to be composed with other things appropriately to get the desired effect. hGetContents不是太懒,它只需要与其他东西合适地组合以获得所需的效果。 Maybe the situation would be clearer if it were were renamed exposeContentsToEvaluationAsNeededForTheRestOfTheAction or just listen . 如果它被重命名为exposeContentsToEvaluationAsNeededForTheRestOfTheAction或者只是listen ,情况会更清楚。

withFile opens the file, does something (or nothing, as you please -- exactly what you require of it in any case), and closes the file. withFile打开文件,执行某些操作(或者根本不做任何操作 - 确切地说,无论如何都需要它),并关闭文件。

It will hardly suffice to bring out all the mysteries of 'lazy IO', but consider now this difference in bracketing 揭开“懒惰IO”的所有神秘面纱是不够的,但现在考虑一下这种包围的区别

 good file operation = withFile file ReadMode (hGetContents >=> operation >=> print)
 bad file operation = (withFile file ReadMode hGetContents) >>= operation >>= print

-- *Main> good "lazyio.hs" (return . length)
-- 503
-- *Main> bad "lazyio.hs" (return . length)
-- 0

Crudely put, bad opens and closes the file before it does anything; 粗略地说, bad打开并关闭文件,然后才能执行任何操作; good does everything in between opening and closing the file. good打开和关闭文件之间的所有内容。 Your first action was akin to bad . 你的第一个动作类似于bad withFile should govern all of the action you want done that that depends on the handle. withFile应该控制你想要完成的所有操作,这取决于句柄。

You don't need a strictness enforcer if you are working with String , small files, etc., just an idea how the composition works. 如果您正在使用String ,小文件等,则不需要严格执行器,只需了解组合的工作原理。 Again, in bad all I 'do' before closing the file is exposeContentsToEvaluationAsNeededForTheRestOfTheAction . 再次,在关闭文件之前我所做的一切都很bad的是exposeContentsToEvaluationAsNeededForTheRestOfTheAction In good I compose exposeContentsToEvaluationAsNeededForTheRestOfTheAction with the rest of the action I have in mind, then close the file. good我将exposeContentsToEvaluationAsNeededForTheRestOfTheAction与我exposeContentsToEvaluationAsNeededForTheRestOfTheAction的其余动作一起构成,然后关闭文件。

The familiar length + seq trick mentioned by Patrick, or length + evaluate is worth knowing; 帕特里克提到的熟悉的length + seq技巧,或length + evaluate值得了解; your second action with putStrLn txt was a variant. 你用putStrLn txt做的第二个动作是一个变种。 But reorganization is better, unless lazy IO is wrong for your case. 但重组更好,除非懒惰IO对你的情况是错误的。

$ time ./bad
bad: Prelude.last: empty list  
                        -- no, lots of Chars there
real    0m0.087s

$ time ./good
'\n'                -- right
()
real    0m15.977s

$ time ./seqing 
Killed               -- hopeless, attempting to represent the file contents
    real    1m54.065s    -- in memory as a linked list, before finding out the last char

It goes without saying that ByteString and Text are worth knowing about, but reorganization with evaluation in mind is better, since even with them the Lazy variants are often what you need, and they then involve grasping the same distinctions between forms of composition. 不言而喻,ByteString和Text值得了解,但是考虑到评估的重组更好,因为即使使用它们,Lazy变体通常也是你需要的,然后他们就会在构图形式之间理解相同的区别。 If you are dealing with one of the (immense) class of cases where this sort of IO is inappropriate, take a look at enumerator , conduit and co., all wonderful. 如果你正在处理这类IO不合适的(巨大的)类别的案例之一,请查看enumeratorconduit和公司,这一切都很精彩。

hGetContents uses lazy IO; hGetContents使用懒惰的IO; it only reads from the file as you force more of the string, and it only closes the file handle when you evaluate the entire string it returns. 它只会在您强制执行更多字符串时从文件中读取,并且只在评估它返回的整个字符串时才关闭文件句柄。 The problem is that you're enclosing it in withFile ; 问题是你把它放在withFile ; instead, just use openFile and hGetContents directly (or, more simply, readFile ). 相反,只需直接使用openFilehGetContents (或者更简单地说,使用readFile )。 The file will still get closed once you fully evaluate the string. 完全评估字符串后,文件仍将关闭。 Something like this should do the trick, to ensure that the file is fully read and closed immediately by forcing the entire string beforehand: 像这样的东西应该做的,以确保文件完全读取和立即通过强制整个字符串关闭:

import Control.Exception (evaluate)

readCode :: FilePath -> IO Code
readCode fileName = do
    text <- readFile fileName
    evaluate (length text)
    return (parseCode text)

Unintuitive situations like this are one of the reasons people tend to avoid lazy IO these days, but unfortunately you can't change the definition of hGetContents . 像这样的不直观的情况是人们现在倾向于避免懒惰IO的原因之一,但不幸的是你不能改变hGetContents的定义。 A strict IO version of hGetContents is available in the strict package, but it's probably not worth depending on the package just for that one function. 严格的包中提供了严格的IO版本的hGetContents ,但它可能不值得依赖于那个函数的包。

If you want to avoid the overhead that comes from traversing the string twice here, then you should probably look into using a more efficient type than String , anyway; 如果你想避免在这里两次遍历字符串所产生的开销,那么你应该考虑使用比String更高效的类型,无论如何; the Text type has strict IO equivalents for much of the String -based IO functionality, as does ByteString (if you're dealing with binary data, rather than Unicode text). 对于大部分基于String的IO功能, Text类型具有严格的IO等价物ByteString也是如此 (如果您处理的是二进制数据,而不是Unicode文本)。

You can force the contents of text to be evaluated using 您可以使用强制评估text的内容

length text `seq` return code

as the last line. 作为最后一行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM