为什么这个C ++程序在Windows上比Linux慢？

Question

Consider the following program: 考虑以下程序：

#define _FILE_OFFSET_BITS 64   // Allow large files.
#define REVISION "POSIX Revision #9"

#include <iostream>
#include <cstdio>
#include <ctime>

const int block_size = 1024 * 1024;
const char block[block_size] = {};

int main()
{
    std::cout << REVISION << std::endl;  

    std::time_t t0 = time(NULL);

    std::cout << "Open: 'BigFile.bin'" << std::endl;
    FILE * file;
    file = fopen("BigFile.bin", "wb");
    if (file != NULL)
    {
        std::cout << "Opened. Writing..." << std::endl;
        for (int n=0; n<4096; n++)
        {
            size_t written = fwrite(block, 1, block_size, file);
            if (written != block_size)
            {
                std::cout << "Write error." << std::endl;
                return 1;
            }
        }
        fclose(file);
        std::cout << "Success." << std::endl;

        time_t t1 = time(NULL);
        if (t0 == ((time_t)-1) || t1 == ((time_t)-1))
        {
            std::cout << "Clock error." << std::endl;
            return 2;
        }

        double ticks = (double)(t1 - t0);
        std::cout << "Seconds: " << ticks << std::endl;

        file = fopen("BigFile.log", "w");
        fprintf(file, REVISION);
        fprintf(file, "   Seconds: %f\n", ticks);
        fclose(file);

        return 0;
    }

    std::cout << "Something went wrong." << std::endl;
    return 1;
}

It simply writes 4GB of zeros to a file on disk and times how long it took. 它只是将4GB的零写入磁盘上的文件并计算所需的时间。

Under Linux, this takes 148 seconds on average. 在Linux下，平均需要148秒。 Under Windows, on the same PC, it takes on average 247 seconds. 在Windows下，在同一台PC上，平均需要247秒。

What the hell am I doing wrong?! 我到底做错了什么？！

The code is compiled under GCC for Linux, and Visual Studio for Windows, but I cannot imagine a universe in which the compiler used should make any measurable difference to a pure I/O benchmark. 代码是在GCC for Linux和Visual Studio for Windows下编译的，但是我无法想象编译器使用的宇宙应该对纯I / O基准产生任何可测量的差异。 The filesystem used in all cases is NTFS. 在所有情况下使用的文件系统是NTFS。

I just don't understand why such a vast performance difference exists. 我只是不明白为什么存在如此巨大的性能差异。 I don't know why Windows is running so slow. 我不知道为什么Windows运行这么慢。 How do I force Windows to run at the full speed that the disk is clearly capable of? 如何强制Windows以磁盘显然能够的全速运行？

(The numbers above are for OpenSUSE 13.1 32-bit and Windows XP 32-bit on an old Dell laptop. But I've observed similar speed differences on several PCs around the office, running various versions of Windows.) （以上数字适用于旧戴尔笔记本电脑上的OpenSUSE 13.1 32位和Windows XP 32位。但我观察到办公室周围的几台PC上存在类似的速度差异，运行各种版本的Windows。）

Edit: The executable and the file it writes both reside on an external USB harddisk which is formatted as NTFS and is nearly completely empty. 编辑：可执行文件及其写入的文件都驻留在外部USB硬盘上，该硬盘格式为NTFS并且几乎完全为空。 Fragmentation is almost certainly not a problem. 碎片几乎肯定不是问题。 It could be some kind of driver issue, but I've seen the same performance difference on several other systems running different versions of Windows. 它可能是某种驱动程序问题，但我在运行不同版本Windows的其他几个系统上看到了相同的性能差异。 There is no antivirus installed. 没有安装防病毒软件。

Just for giggles, I tried changing it to use the Win32 API directly. 只是为了咯咯笑，我尝试将其更改为直接使用Win32 API。 (Obviously this only works for Windows.) Time becomes a little more erratic, but still within a few percent of what it was before. （显然这只适用于Windows。）时间变得稍微不稳定，但仍然只是之前的百分之几。 Unless I specify FILE_FLAG_WRITE_THROUGH ; 除非我指定FILE_FLAG_WRITE_THROUGH ; then it goes significantly slower. 然后它变得非常慢。 A few other flags make it slower, but I can't find the one that makes it go faster ... 一些其他的标志使它变慢，但我找不到让它变得更快的那个...

Answer 1

You need to sync file contents to disk, otherwise you are just measuring the level of caching being performed by the operating system. 您需要将文件内容同步到磁盘，否则您只是测量操作系统正在执行的缓存级别。

Call fsync before you close the file. 在关闭文件之前调用fsync 。

If you don't do this, the majority of execution time is most likely spent waiting for cache to be flushed so that new data can be stored in it, but certainly a portion of the data you write will not be written out to disk by the time you close the file. 如果不这样做，大部分执行时间很可能花在等待刷新缓存上，以便新数据可以存储在其中，但当然你写的一部分数据不会写入磁盘关闭文件的时间。 The difference in execution times, then, is probably due to linux caching more of the writes before it runs out of available cache space. 那么，执行时间的差异可能是由于linux在可用缓存空间用完之前缓存了更多的写入。 By contrast, if you call fsync before closing the file, all the written data should be flushed to disk before your time measurement takes place. 相反，如果在关闭文件之前调用fsync ，则应在进行时间测量之前将所有写入的数据刷新到磁盘。

I suspect if you add an fsync call, the execution time on the two systems won't differ by so much. 我怀疑如果你添加一个fsync调用，两个系统上的执行时间差别不会太大。

Answer 2

Your test is not very good way to measure performance as there's places where different optimizations in different OS'es and libraries can make a huge difference (the compiler itself don't have to make a big difference). 您的测试不是测量性能的好方法，因为不同的OS和库中的不同优化可以产生巨大的差异（编译器本身不必产生很大的不同）。

First we can consider the fwrite (or anything that operates on FILE* ) is a library layer above the OS-layer. 首先，我们可以认为fwrite （或任何在FILE*上运行的）是OS层之上的库层。 There can be different buffering strategies that make a difference. 可以有不同的缓冲策略有所作为。 For example one smart way of implementing fwrite would be to flush the buffers and then send the data block straight to the OS instead of go through the buffer layer. 例如，一种实现fwrite智能方法是刷新缓冲区，然后将数据块直接发送到OS而不是通过缓冲层。 This can result in a huge advantage at the next step 这可以在下一步产生巨大的优势

Second we have the OS/kernel that can handle the write differently. 其次，我们有可以不同地处理写入的OS /内核。 One smart optimization would be to copy pages by just aliasing them and then use copy-on-write if changed in one of the aliases. 一种智能优化方法是通过对页面进行别名来复制页面，然后在其中一个别名中使用copy-on-write。 Linux already does (almost) this when allocating memory to the process (including the BSS section where the array is) - it just marks the page as being zeros and can keep a single such page for all those pages and then creating a new page whenever somebody changes in a zero page. Linux在为进程分配内存（包括数组所在的BSS部分）时已经（差不多）这样做 - 它只是将页面标记为零并且可以为所有这些页面保留单个这样的页面，然后每当创建一个新页面有人在零页面上进行了更改。 Doing this trick again means that the kernel could just alias a such page in the disk buffer. 再次执行此技巧意味着内核可以在磁盘缓冲区中对此类页面进行别名。 This means that the kernel would not run low on disk cache when writing such blocks of zeroes since it will only take up 4KiB of actual memory (except for page tables). 这意味着在写入这样的零块时内核不会在磁盘高速缓存上运行不足，因为它只占用4KiB的实际内存（页表除外）。 This strategy is also possible if there's actual data in the data block. 如果数据块中存在实际数据，则也可以采用此策略。

This means that the writes could complete very quickly without any data actually needs to be transferred to the disk (before fwrite completes), even without the data even have to be copied from one place to another in memory. 这意味着写入可以非常快速地完成，而实际上不需要将任何数据传输到磁盘（在fwrite完成之前），即使数据甚至不必从内存中的一个地方复制到另一个地方。

So you use different libraries and different OS'es and it's not surprising that they perform different task in different time. 因此，您使用不同的库和不同的OS，并且它们在不同的时间执行不同的任务并不奇怪。

Answer 3

There are special optimizations for pages which are all zeros. 对于全为零的页面有特殊优化。 You should fill the page with random data before writing it out. 在写出之前，您应该使用随机数据填充页面。

为什么这个C ++程序在Windows上比Linux慢？

问题描述

3 个解决方案

解决方案1
3 2015-10-21 12:03:29

解决方案2
0 2015-10-21 11:43:22

解决方案3
0 2015-10-21 12:34:32

为什么这个C ++程序在Windows上比Linux慢？

问题描述

3 个解决方案

解决方案1 3 2015-10-21 12:03:29

解决方案2 0 2015-10-21 11:43:22

解决方案3 0 2015-10-21 12:34:32

解决方案1
3 2015-10-21 12:03:29

解决方案2
0 2015-10-21 11:43:22

解决方案3
0 2015-10-21 12:34:32