简体   繁体   中英

How to speed up creation of a FileStream

My application needs to open a lot of small files, say 1440 files each containing data of 1 minute to read all the data of a certain day. Each file is only a couple of kB big. This is for a GUI application, so I want the user (== me!) to not have to wait too long.

It turns out that opening the files is rather slow. After researching, most time is wasted in creating a FileStream (OpenStream = new FileStream) for each file. Example code :

// stream en reader aanmaken
FileStream OpenStream;
BinaryReader bReader;

foreach (string file in files)
{
    // bestaat de file? dan inlezen en opslaan
    if (System.IO.File.Exists(file))
    {
        long Start = sw.ElapsedMilliseconds;

        // file read only openen, anders kan de applicatie crashen
        OpenStream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);

        Tijden.Add(sw.ElapsedMilliseconds - Start);

        bReader = new BinaryReader(OpenStream);

        // alles in één keer inlezen, werkt goed en snel
        // -bijhouden of appenden nog wel mogelijk is, zonodig niet meer appenden
        blAppend &= Bestanden.Add(file, bReader.ReadBytes((int)OpenStream.Length), blAppend);

        // file sluiten
        bReader.Close();
    }
}

Using the stopwatch timer, I see that most (> 80%) of the time is spent on creating the FileStream for each file. Creating the BinaryReader and actually reading the file (Bestanden.add) takes almost no time.

I'm baffled about this and cannot find a way to speed it up. What can I do to speed up the creation of the FileStream?

update to the question:

  • this happens both on windows 7 and windows 10
  • the files are local (on a SSD disk)
  • there are only the 1440 files in a directory
  • strangely, reading the (same) files again later, creating the FileStreams suddenly cost almost no time at all. Somewhere the OS is remembering the filestreams.
  • even if I close the application and restart it, opening the files "again" also costs almost no time. This makes it pretty hard to find the performance issue. I had to make a lot of copies of directory to recreate the problem over and over.

As you have mentioned in the comment to the question FileStream reads first 4K to buffer by creating the object. You can change the size of this buffer to reflect better size of your data. (Decrease if your files are smaller than the buffer, for example). If you read file sequentially, you can give OS the hint about this through FileOptions . In addition, you can avoid BinaryReader , because you read files entirely.

    // stream en reader aanmaken
    FileStream OpenStream;

    foreach (string file in files)
    {
        // bestaat de file? dan inlezen en opslaan
        if (System.IO.File.Exists(file))
        {
            long Start = sw.ElapsedMilliseconds;

            // file read only openen, anders kan de applicatie crashen
            OpenStream = new FileStream(
                file,
                FileMode.Open,
                FileAccess.Read,
                FileShare.ReadWrite,
                bufferSize: 2048, //2K for example 
                options: FileOptions.SequentialScan);

            Tijden.Add(sw.ElapsedMilliseconds - Start);

            var bufferLenght = (int)OpenStream.Length;
            var buffer = new byte[bufferLenght];
            OpenStream.Read(buffer, 0, bufferLenght);

            // alles in één keer inlezen, werkt goed en snel
            // -bijhouden of appenden nog wel mogelijk is, zonodig niet meer appenden
            blAppend &= Bestanden.Add(file, buffer, blAppend);
        }
    }

I do not know type of Bestanden object. But if this object has methods for reading from array you can also reuse buffer for files.

    //the buffer should be bigger than the biggest file to read
    var bufferLenght = 8192;
    var buffer = new byte[bufferLenght];

    foreach (string file in files)
    {
        //skip 
        ...
        var fileLenght = (int)OpenStream.Length;
        OpenStream.Read(buffer, 0, fileLenght);

        blAppend &= Bestanden.Add(file, /*read bytes from buffer */, blAppend);

I hope it helps.

Disclaimer: this answer is just a (founded) speculation that it's rather a Windows bug than something you can fix with different code.

So this behaviour might relate to the Windows bug described here: "24-core CPU and I can't move my mouse" .

These processes were all releasing the lock from within NtGdiCloseProcess.

So if FileStream uses and holds such a critical lock in the OS, it would wait a few µSecs for every file which would add up for thousands of files. It may be a different lock, but the above mentioned bug at least adds the possibility of a similar problem.

To prove or disprove this hypothesis some deep knowledge about the inner workings of the kernel would be necessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM