mmap（）与Java MappedByteBuffer的性能？

Question

I have been developing a C++ project from existing Java code. 我一直在用现有的Java代码开发一个C ++项目。 I have the following C++ code and Java code reading from the same test file, which consists of millions of integers. 我从相同的测试文件中读取以下C ++代码和Java代码，该文件包含数百万个整数。

C++: C ++：

    int * arr = new int[len]; //len is larger than the largest int from the data
    fill_n(arr, len, -1);  //fill with -1
    long loadFromIndex = 0;
    struct stat sizeResults;
    long size;
    if (stat(fileSrc, &sizeResults) == 0) {
        size = sizeResults.st_size; //here size would be ~551950000 for 552M test file
    }
    mmapFile = (char *)mmap(NULL, size, PROT_READ, MAP_SHARED, fd, pageNum*pageSize);
    long offset = loadFromIndex % pageSize;
    while (offset < size) {
        int i = htonl(*((int *)(mmapFile + offset)));
        offset += sizeof(int);
        int j = htonl(*((int *)(mmapFile + offset)));
        offset += sizeof(int);
        swapElem(i, j, arr);
    }
    return arr;

Java: Java的：

    IntBuffer bb = srcFile.getChannel()
                    .map(MapMode.READ_ONLY, loadFromIndex, size)
                    .asIntBuffer().asReadOnlyBuffer();
    while (bb.hasRemaining()) {
        int i = bb.get();
        int j = bb.get();
        swapElem(i, j, arr); //arr is an int[] of the same size as the arr in C++ version, filled with -1
    }
    return arr;

void swapElem(arr) in C++ and Java are identical. C ++和Java中的void swapElem(arr)相同。 It compares and modifies values in the array, but the original code is kind of long to post here. 它比较和修改数组中的值，但是原始代码很长一段时间才能在此处发布。 For testing purpose, I replaced it with the following function so the loop won't be dead code: 为了进行测试，我将其替换为以下函数，因此该循环不会成为无效代码：

void swapElem(int i, int j, int * arr){   // int[] in Java
    arr[i] = j;
}

I assumed the C++ version should outperform the java version, but the test gives the opposite result -- Java code is almost two times faster than the C++ code. 我以为C ++版本应该胜过Java版本，但是测试给出了相反的结果-Java代码几乎比C ++代码快两倍。 Is there any way to improve the C++ code? 有什么方法可以改善C ++代码吗？

I feel maybe the mmapFile+offset in C++ is repeated too many times so it is O(n) additions for that and O(n) additions for offset+=sizeof(int) , where n is number of integers to read. 我觉得也许C ++中的mmapFile+offset重复了太多次，所以它是O（n）的加法和O（n）的offset+=sizeof(int)加法，其中n是要读取的整数数。 For Java's IntBuffer.get() , it just directly reads from a buffer's index so no addition operation is needed except O(n) increments of the buffer index by 1. Therefore, including the increments of buffer index, C++ takes O(2n) additions while Java takes O(n) additions. 对于Java的IntBuffer.get() ，它仅直接从缓冲区的索引中读取，因此不需要任何加法运算，只是O（n）的缓冲区索引的增量为1。因此，包括缓冲区索引的增量，C ++的取值为O（2n）。 Java使用O（n）加法。 When it comes to millions of data, it might cause significant performance difference. 当涉及数百万个数据时，可能会导致明显的性能差异。

Following this idea, I modified the C++ code as follows: 遵循这个想法，我修改了C ++代码，如下所示：

    mmapBin = (char *)mmap(NULL, size, PROT_READ, MAP_SHARED, fd, pageNum*pageSize);
    int len = size - loadFromIndex % pageSize;
    char * offset = loadFromIndex % pageSize + mmapBin;
    int index = 0;
    while (index < len) {
        int i = htonl(*((int *)(offset)));
        offset += sizeof(int);
        int j = htonl(*((int *)(offset)));
        offset += sizeof(int);
        index+=2*sizeof(int);
    }

I assumed there will be a slight performance gain, but there isn't. 我以为会有一点性能提升，但是没有。

Can anyone explain why the C++ code works slower than the Java code does? 谁能解释为什么C ++代码比Java代码运行得慢？ Thanks. 谢谢。

Update: 更新：

I have to apologize that when I said -O2 does not work, there was a problem at my end. 我不得不道歉，当我说-O2不起作用时，我的问题就出现了。 I messed up Makefile so the C++ code did not recompile using -O2. 我弄乱了Makefile，因此C ++代码没有使用-O2重新编译。 I've updated the performance and the C++ version using -O2 has outperformed the Java version. 我已经更新了性能，使用-O2的C ++版本已经超过了Java版本。 This can seal the question, but if anyone would like to share how to improve the C++ code, I will follow. 这可以解决这个问题，但是如果有人想分享如何改进C ++代码，我将继续。 Generally I would expect it to be 2 times faster than the Java code, but currently it is not. 通常，我希望它比Java代码快2倍，但目前不是。 Thank you all for your input. 谢谢大家的意见。

Compiler: g++ 编译器：g ++

Flags: -Wall -c -O2 标志：-Wall -c -O2

Java Version: 1.8.0_05 Java版本：1.8.0_05

Size of File: 552MB, all 4 byte integers 文件大小：552MB，所有4个字节的整数

Processor: 2.53 GHz Intel Core 2 Duo 处理器：2.53 GHz Intel Core 2 Duo

Memory 4GB 1067 MHz DDR3 内存4GB 1067 MHz DDR3

Updated Benchmark: 更新基准：

Version Time(ms) 版本时间（毫秒）

C++: ~1100 C ++：〜1100

Java: ~1400 爪哇：〜1400

C++(without the while loop): ~35 C ++（无while循环）：〜35

Java(without the while loop): ~40 Java（没有while循环）：〜40

I have something before these code that causes the ~35ms performance(mostly filling the array with -1), but that is not important here. 在这些代码之前，我有一些东西会导致〜35ms性能（通常以-1填充数组），但这在这里并不重要。

Answer 1

I have some doubts that the benchmark method is correct. 我怀疑基准测试方法是否正确。 Both codes are "dead" codes. 这两个代码都是“死”代码。 You don't actually use i and j anywhere so the gcc compiler or Java JIT might decide to actually remove the loop seeing that it has no effect on the future code flow. 您实际上并没有在任何地方使用i和j，因此gcc编译器或Java JIT可能会决定删除循环，因为它对将来的代码流没有影响。

Anyway, I would change the C++ code to: 无论如何，我将C ++代码更改为：

mmapFile = (char *)mmap(NULL, size, PROT_READ, MAP_SHARED, fd, pageNum*pageSize);
long offset = loadFromIndex % pageSize;
int i, j;
int szInc = 2 * sizeof(int);
while (offset < size) {
    scanf(mmapFile, "%d", &i);
    scanf(mmapFile, "%d", &j);
    offset += szInc; // offset += 8;
}

This would be the equivalent to Java code. 这将等效于Java代码。 In addition I would continue using -O2 as compilation flags. 另外，我将继续使用-O2作为编译标志。 Keep in mind that htonl is an extra conversion that Java code does not seem to do it. 请记住， htonl是Java代码似乎没有做的额外转换。

mmap（）与Java MappedByteBuffer的性能？

问题描述

Update: 更新：

Updated Benchmark: 更新基准：

1 个解决方案

解决方案1
0 2014-10-28 13:26:14

mmap（）与Java MappedByteBuffer的性能？

问题描述

Update: 更新：

Updated Benchmark: 更新基准：

1 个解决方案

解决方案1 0 2014-10-28 13:26:14

解决方案1
0 2014-10-28 13:26:14