简体   繁体   中英

F# seq behavior

I'm a little baffled about the inner work of the sequence expression in F#.

Normally if we make a sequential file reader with seq with no intentional caching of data

 seq { 
       let mutable current = file.Read()
       while current <> -1 do
           yield current
     }

We will end up with some weird behavior if we try to do some re-iterate or backtracking, My Idea of this was, since Read() is a function calling some mutable value we can't expect the output to be correct if we re-iterate. But then this behaves nicely even on boundary reading?

let Read path =
    seq {
        use fp = System.IO.File.OpenRead path
        let buf = [| for _ in 0 .. 1024 -> 0uy |]
        let mutable pos = 1
        let mutable current = 0
        while pos <> 0 do
            if current = 0 then
                pos <- fp.Read(buf, 0, 1024)
            if pos > 0 && current < pos then
                yield buf.[current]
                current <- (current + 1) % 1024 
   } 

 let content = Read "some path" 

We clearly use the same buffer to enhance performance, but assuming that we read the 1025 byte, it will trigger an update to the buffer, if we then try to read any byte with position < 1025 after we still get the correct output. How can that be and what are the difference?

Your question is a bit unclear, so I'll try to guess.

When you create a seq { } , you're essentially creating a state machine which will run only as far as it needs to. When you request the very first element from it, it'll start at the top and run until your first yield instruction. Then, when you request another value, it'll run from that point until the next yield , and so on.

Keep in mind that a seq { } produces an IEnumerable<'T> , which is like a "plan of execution". Each time you start to iterate the sequence (for example by calling Seq.head ), a call to GetEnumerator is made behind the scenes, which causes a new IEnumerator<'T> to be created. It is the IEnumerator which does the actual providing of values. You can think of it in more classical terms as having an array over which you can iterate (an iterable or enumerable ) and many pointers over that array, each of which are at different points in the array (many iterators or enumerator s).

In your first code, file is most likely external to the seq block. This means that the file you are reading from is baked into the plan of execution ; no matter how many times you start to iterate the sequence, you'll always be reading from the same file. This is obviously going to cause unpredictable behaviour.

However, in your second code, the file is opened as part of the seq block's definition. This means that you'll get a new file handle each time you iterate the sequence or, essentially, a new file handle per enumerator . The reason this code works is that you can't reverse an enumerator or iterate over it multiple times, not with a single thread at least.

(Now, if you were to manually get an enumerator and advance it over multiple threads, you'd probably run into problems very quickly. But that is a different topic.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM