简体   繁体   English

如何加快FileStream的创建速度

[英]How to speed up creation of a FileStream

My application needs to open a lot of small files, say 1440 files each containing data of 1 minute to read all the data of a certain day. 我的应用程序需要打开很多小文件,比如1440个文件,每个文件包含1分钟的数据,以读取某一天的所有数据。 Each file is only a couple of kB big. 每个文件只有几个KB大。 This is for a GUI application, so I want the user (== me!) to not have to wait too long. 这是一个GUI应用程序,所以我希望用户(==我!)不必等待太长时间。

It turns out that opening the files is rather slow. 事实证明打开文件相当慢。 After researching, most time is wasted in creating a FileStream (OpenStream = new FileStream) for each file. 经过研究,大多数时间都浪费在为每个文件创建FileStream(OpenStream = new FileStream)。 Example code : 示例代码:

// stream en reader aanmaken
FileStream OpenStream;
BinaryReader bReader;

foreach (string file in files)
{
    // bestaat de file? dan inlezen en opslaan
    if (System.IO.File.Exists(file))
    {
        long Start = sw.ElapsedMilliseconds;

        // file read only openen, anders kan de applicatie crashen
        OpenStream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);

        Tijden.Add(sw.ElapsedMilliseconds - Start);

        bReader = new BinaryReader(OpenStream);

        // alles in één keer inlezen, werkt goed en snel
        // -bijhouden of appenden nog wel mogelijk is, zonodig niet meer appenden
        blAppend &= Bestanden.Add(file, bReader.ReadBytes((int)OpenStream.Length), blAppend);

        // file sluiten
        bReader.Close();
    }
}

Using the stopwatch timer, I see that most (> 80%) of the time is spent on creating the FileStream for each file. 使用秒表计时器,我发现大多数(> 80%)的时间花在为每个文件创建FileStream上。 Creating the BinaryReader and actually reading the file (Bestanden.add) takes almost no time. 创建BinaryReader并实际读取文件(Bestanden.add)几乎没有时间。

I'm baffled about this and cannot find a way to speed it up. 我对此感到困惑,无法找到加快速度的方法。 What can I do to speed up the creation of the FileStream? 我该怎么做才能加快FileStream的创建速度?

update to the question: 更新问题:

  • this happens both on windows 7 and windows 10 在Windows 7和Windows 10上都会发生这种情况
  • the files are local (on a SSD disk) 文件是本地的(在SSD磁盘上)
  • there are only the 1440 files in a directory 目录中只有1440个文件
  • strangely, reading the (same) files again later, creating the FileStreams suddenly cost almost no time at all. 奇怪的是,稍后再次读取(相同)文件,创建FileStreams突然几乎没有时间。 Somewhere the OS is remembering the filestreams. 操作系统在某处记住文件流。
  • even if I close the application and restart it, opening the files "again" also costs almost no time. 即使我关闭应用程序并重新启动它,“再次”打开文件也几乎没有时间。 This makes it pretty hard to find the performance issue. 这使得很难找到性能问题。 I had to make a lot of copies of directory to recreate the problem over and over. 我不得不制作很多目录副本来反复重新创建问题。

As you have mentioned in the comment to the question FileStream reads first 4K to buffer by creating the object. 正如您在问题的评论中提到的, FileStream通过创建对象将第一个4K读取为缓冲区。 You can change the size of this buffer to reflect better size of your data. 您可以更改此缓冲区的大小以反映更好的数据大小。 (Decrease if your files are smaller than the buffer, for example). (例如,如果文件小于缓冲区,则减少)。 If you read file sequentially, you can give OS the hint about this through FileOptions . 如果按顺序读取文件,则可以通过FileOptions为OS提供有关此内容的提示。 In addition, you can avoid BinaryReader , because you read files entirely. 此外,您可以避免使用BinaryReader ,因为您完全读取文件。

    // stream en reader aanmaken
    FileStream OpenStream;

    foreach (string file in files)
    {
        // bestaat de file? dan inlezen en opslaan
        if (System.IO.File.Exists(file))
        {
            long Start = sw.ElapsedMilliseconds;

            // file read only openen, anders kan de applicatie crashen
            OpenStream = new FileStream(
                file,
                FileMode.Open,
                FileAccess.Read,
                FileShare.ReadWrite,
                bufferSize: 2048, //2K for example 
                options: FileOptions.SequentialScan);

            Tijden.Add(sw.ElapsedMilliseconds - Start);

            var bufferLenght = (int)OpenStream.Length;
            var buffer = new byte[bufferLenght];
            OpenStream.Read(buffer, 0, bufferLenght);

            // alles in één keer inlezen, werkt goed en snel
            // -bijhouden of appenden nog wel mogelijk is, zonodig niet meer appenden
            blAppend &= Bestanden.Add(file, buffer, blAppend);
        }
    }

I do not know type of Bestanden object. 我不知道Bestanden对象的类型。 But if this object has methods for reading from array you can also reuse buffer for files. 但是,如果此对象具有从数组中读取的方法,则还可以为文件重用缓冲区。

    //the buffer should be bigger than the biggest file to read
    var bufferLenght = 8192;
    var buffer = new byte[bufferLenght];

    foreach (string file in files)
    {
        //skip 
        ...
        var fileLenght = (int)OpenStream.Length;
        OpenStream.Read(buffer, 0, fileLenght);

        blAppend &= Bestanden.Add(file, /*read bytes from buffer */, blAppend);

I hope it helps. 我希望它有所帮助。

Disclaimer: this answer is just a (founded) speculation that it's rather a Windows bug than something you can fix with different code. 免责声明:这个答案只是一个(成熟的)推测,它是一个Windows错误,而不是你可以用不同的代码修复的东西。

So this behaviour might relate to the Windows bug described here: "24-core CPU and I can't move my mouse" . 所以这种行为可能与这里描述的Windows错误有关: “24核CPU,我无法移动我的鼠标”

These processes were all releasing the lock from within NtGdiCloseProcess. 这些进程都是从NtGdiCloseProcess中释放锁。

So if FileStream uses and holds such a critical lock in the OS, it would wait a few µSecs for every file which would add up for thousands of files. 因此,如果FileStream在操作系统中使用并保持这样一个关键锁定,那么它将等待几个μSecs用于每个文件,这将增加数千个文件。 It may be a different lock, but the above mentioned bug at least adds the possibility of a similar problem. 它可能是一个不同的锁,但上面提到的错误至少增加了类似问题的可能性。

To prove or disprove this hypothesis some deep knowledge about the inner workings of the kernel would be necessary. 为了证明或反驳这个假设,有必要深入了解内核的内部工作原理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM