简体   繁体   English

用C#创建文件的最快方法

[英]Fastest way to create files in C#

I'm running a program to benchmark how fast finding and iterating over all the files in a folder with large numbers of files. 我正在运行一个程序来测试查找和迭代包含大量文件的文件夹中的所有文件的速度。 The slowest part of the process is creating the 1 million plus files. 该过程中最慢的部分是创建100万个文件。 I'm using a pretty naive method to create the files at the moment: 我正在使用一种非常天真的方法来创建文件:

Console.Write("Creating {0:N0} file(s) of size {1:N0} bytes... ", 
    options.FileCount, options.FileSize);
var createTimer = Stopwatch.StartNew();
var fileNames = new List<string>();
for (long i = 0; i < options.FileCount; i++)
{
    var filename = Path.Combine(options.Directory.FullName, 
                        CreateFilename(i, options.FileCount));
    using (var file = new FileStream(filename, FileMode.CreateNew, 
                        FileAccess.Write, FileShare.None, 4096, 
                        FileOptions.WriteThrough))
    {
        // I have an option to write some data to files, but it's not being used. 
        // That's why there's a using here.
    }
    fileNames.Add(filename);
}
createTimer.Stop();
Console.WriteLine("Done.");

// Other code appears here.....

Console.WriteLine("Time to  CreateFiles: {0:N3}sec ({1:N2} files/sec, 1 in {2:N4}ms)"
       , createTimer.Elapsed.TotalSeconds
       , (double)total / createTimer.Elapsed.TotalSeconds
       , createTimer.Elapsed.TotalMilliseconds / (double)options.FileCount);

Output: 输出:

Creating 1,000,000 file(s) of size 0 bytes... Done.
Time to  CreateFiles: 9,182.283sec (1,089.05 files/sec, 1 in 9.1823ms)

If there anything obviously better than this? 如果有什么明显比这更好的? I'm looking to test several orders of magnitude larger than 1 million, and it takes a day to create the files! 我想测试几个数量级超过100万的数量级,创建文件需要一天的时间!

I havn't tried any sort of parallelism, trying to optimise any file system options or changing the order of file creation. 我没有尝试任何类型的并行性,尝试优化任何文件系统选项或更改文件创建的顺序。

For completeness, here's the content of CreateFilename() : 为了完整性,这里是CreateFilename()的内容:

public static string CreateFilename(long i, long totalFiles)
{
    if (totalFiles < 0)
        throw new ArgumentOutOfRangeException("totalFiles", 
            totalFiles, "totalFiles must be positive");

    // This tries to keep filenames to the 8.3 format as much as possible.
    if (totalFiles < 99999999)
        // No extension.
        return String.Format("{0:00000000}", i);
    else if (totalFiles >= 100000000 && totalFiles < 9999999999)
    {
        // Extend numbers into extension.
        long rem = 0;
        long div = Math.DivRem(i, 1000, out rem);
        return String.Format("{0:00000000}", div) + "." + 
            String.Format("{0:000}", rem);
    }
    else
        // Doesn't fit in 8.3, so just tostring the long.
        return i.ToString();
}

UPDATE UPDATE

Tried to parallelise as per StriplingWarrior's suggestion using Parallel.For() . 尝试使用Parallel.For()根据StriplingWarrior的建议进行Parallel.For() Results: about 30 threads thrashing my disk and a net slow down! 结果:大约30个线程颠簸我的磁盘,网络减速!

        var fileNames = new ConcurrentBag<string>();
        var opts = new ParallelOptions();
        opts.MaxDegreeOfParallelism = 1;       // 1 thread turns out to be fastest.
        Parallel.For(0L, options.FileCount, opts,
            () => new { Files = new List<string>() },   
            (i, parState, state) =>
            {
                var filename = Path.Combine(options.Directory.FullName, 
                                   CreateFilename(i, options.FileCount));
                using (var file = new FileStream(filename, FileMode.CreateNew
                                  , FileAccess.Write, FileShare.None
                                  , 4096, FileOptions.WriteThrough))
                {
                }
                fileNames.Add(filename);
                return state;
            },
            state => 
            {
                foreach (var f in state.Files)
                {
                    fileNames.Add(f);
                }
            });
        createTimer.Stop();
        Console.WriteLine("Done.");

Found that changing the FileOptions in the FileStream improved perf by ~50%. 发现更改FileStreamFileOptions将性能提高约50%。 Seems I was turning off any write cache. 似乎我关闭了任何写缓存。

new FileStream(filename, FileMode.CreateNew, 
                 FileAccess.Write, FileShare.None, 
                 4096, FileOptions.None)

Results: 结果:

Creating 10,000 file(s) of size 0 bytes... Done.
Time to  CreateFiles: 12.390sec (8,071.05 files/sec, 1 in 1.2390ms)

Other ideas still welcome. 其他想法仍然受欢迎。

Your biggest bottleneck here is undoubtedly your hard drive. 你最大的瓶颈无疑是你的硬盘。 In some quick testing, I was able to see some significant performance improvements (but not orders of magnitude) by taking advantage of parallelism: 在一些快速测试中,我通过利用并行性能够看到一些显着的性能改进(但不是数量级):

Parallel.For(1, 10000,
    i => File.Create(Path.Combine(path, i.ToString())));

Interestingly enough, on my machine at least, an SSD does not seem to make a big difference for this operation. 有趣的是,至少在我的机器上,SSD似乎对此操作没有太大影响。

  • On my HDD, the above code creates 100,000 files in roughly 31 seconds. 在我的硬盘上,上面的代码在大约31秒内创建了100,000个文件。
  • On my SDD, the above code creates 100,000 files in roughly 33 seconds. 在我的SDD上,上面的代码在大约33秒内创建了100,000个文件。

The fastest way I found was a simple loop around File.Create() : 我找到的最快的方法是围绕File.Create()的简单循环:

IEnumerable filenames = GetFilenames();
foreach (var filename in filenames)
{
    File.Create(filename);
}

Which is equivalent to (what I'm actually using in code): 这相当于(我在代码中实际使用的内容):

IEnumerable filenames= GetFilenames();
foreach (var filename in filenames)
{
    new FileStream(filename, FileMode.CreateNew, 
             FileAccess.Write, FileShare.None, 
             4096, FileOptions.None)
}

And if you actually want to write something to the file: 如果你真的想写一些东西到文件:

IEnumerable filenames= GetFilenames();
foreach (var filename in filenames)
{
    using (var fs = new FileStream(filename, FileMode.CreateNew, 
             FileAccess.Write, FileShare.None, 
             4096, FileOptions.None))
    {
        // Write something to your file.
    }
}

Things that don't seem to help: 似乎没有帮助的事情:

  • Parallelism in the form of Parallel.ForEach() or Parallel.For() . Parallel.ForEach Parallel.ForEach()Parallel.For()形式的Parallel.ForEach() This produces a net slowdown which gets worse as the number of threads increase. 这会产生净减速,随着线程数量的增加而变得更糟。
  • According to StriplingWarrior, an SSD. 根据固态硬盘StriplingWarrior的说法。 Haven't tested myself (yet), but I speculate this may be because there are so many small writes. 还没有测试过我自己,但我推测这可能是因为有很多小写。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM