简体   繁体   English

Java ByteBuffer性能问题

[英]Java ByteBuffer performance issue

While processing multiple gigabyte files I noticed something odd: it seems that reading from a file using a filechannel into a re-used ByteBuffer object allocated with allocateDirect is much slower than reading from a MappedByteBuffer, in fact it is even slower than reading into byte-arrays using regular read calls! 在处理多个千兆字节的文件时,我注意到一些奇怪的事情:似乎从使用filechannel的文件读取到使用allocateDirect分配的重用ByteBuffer对象比从MappedByteBuffer读取要慢得多,实际上它甚至比读取字节更慢 - 使用常规读取调用的数组!

I was expecting it to be (almost) as fast as reading from mappedbytebuffers as my ByteBuffer is allocated with allocateDirect, hence the read should end-up directly in my bytebuffer without any intermediate copies. 我期望它(几乎)与从mappedbytebuffers读取一样快,因为我的ByteBuffer被分配了allocateDirect,因此读取应该直接在我的bytebuffer中结束而没有任何中间副本。

My question now is: what is it that I'm doing wrong? 我现在的问题是:我做错了什么? Or is bytebuffer+filechannel really slowe r than regular io/mmap? 或者bytebuffer + filechannel是否比常规io / mmap慢?

I the example code below I also added some code that converts what is read into long values, as that is what my real code constantly does. 我下面的示例代码我还添加了一些代码,将读取的内容转换为long值,因为这是我的真实代码不断执行的操作。 I would expect that the ByteBuffer getLong() method is much faster than my own byte shuffeler. 我希望ByteBuffer getLong()方法比我自己的字节shuffeler快得多。

Test-results: mmap: 3.828 bytebuffer: 55.097 regular i/o: 38.175 测试结果:mmap:3.828 bytebuffer:55.097常规i / o:38.175

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;

class testbb {
    static final int size = 536870904, n = size / 24;

    static public long byteArrayToLong(byte [] in, int offset) {
        return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
    }

    public static void main(String [] args) throws IOException {
        long start;
        RandomAccessFile fileHandle;
        FileChannel fileChannel;

        // create file
        fileHandle = new RandomAccessFile("file.dat", "rw");
        byte [] buffer = new byte[24];
        for(int index=0; index<n; index++)
            fileHandle.write(buffer);
        fileChannel = fileHandle.getChannel();

        // mmap()
        MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, size);
        byte [] buffer1 = new byte[24];
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
                mbb.position(index * 24);
                mbb.get(buffer1, 0, 24);
                long dummy1 = byteArrayToLong(buffer1, 0);
                long dummy2 = byteArrayToLong(buffer1, 8);
                long dummy3 = byteArrayToLong(buffer1, 16);
        }
        System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);

        // bytebuffer
        ByteBuffer buffer2 = ByteBuffer.allocateDirect(24);
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
            buffer2.rewind();
            fileChannel.read(buffer2, index * 24);
            buffer2.rewind();   // need to rewind it to be able to use it
            long dummy1 = buffer2.getLong();
            long dummy2 = buffer2.getLong();
            long dummy3 = buffer2.getLong();
        }
        System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);

        // regular i/o
        byte [] buffer3 = new byte[24];
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
                fileHandle.seek(index * 24);
                fileHandle.read(buffer3);
                long dummy1 = byteArrayToLong(buffer1, 0);
                long dummy2 = byteArrayToLong(buffer1, 8);
                long dummy3 = byteArrayToLong(buffer1, 16);
        }
        System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
    }
}

As loading large sections and then processing is them is not an option (I'll be reading data all over the place) I think I should stick to a MappedByteBuffer. 因为加载大型部分然后处理它们不是一个选项(我会在整个地方读取数据)我认为我应该坚持使用MappedByteBuffer。 Thank you all for your suggestions. 谢谢大家的建议。

I believe you are just doing micro-optimization, which might just not matter (www.codinghorror.com) . 我相信你只是在进行微优化, 可能无关紧要 (www.codinghorror.com)

Below is a version with a larger buffer and redundant seek / setPosition calls removed. 下面是一个具有更大缓冲区的版本,并删除了冗余的seek / setPosition调用。

  • When I enable "native byte ordering" (which is actually unsafe if the machine uses a different 'endian' convention): 当我启用“本机字节排序”(如果机器使用不同的'endian'约定,这实际上是不安全的):
 mmap: 1.358 bytebuffer: 0.922 regular i/o: 1.387 
  • When I comment out the order statement and use the default big-endian ordering: 当我注释掉order语句并使用默认的big-endian排序时:
 mmap: 1.336 bytebuffer: 1.62 regular i/o: 1.467 
  • Your original code: 你原来的代码:
 mmap: 3.262 bytebuffer: 106.676 regular i/o: 90.903 

Here's the code: 这是代码:

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;

class Testbb2 {
    /** Buffer a whole lot of long values at the same time. */
    static final int BUFFSIZE = 0x800 * 8; // 8192
    static final int DATASIZE = 0x8000 * BUFFSIZE;

    static public long byteArrayToLong(byte [] in, int offset) {
        return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
    }

    public static void main(String [] args) throws IOException {
        long start;
        RandomAccessFile fileHandle;
        FileChannel fileChannel;

        // Sanity check - this way the convert-to-long loops don't need extra bookkeeping like BUFFSIZE / 8.
        if ((DATASIZE % BUFFSIZE) > 0 || (DATASIZE % 8) > 0) {
            throw new IllegalStateException("DATASIZE should be a multiple of 8 and BUFFSIZE!");
        }

        int pos;
        int nDone;

        // create file
        File testFile = new File("file.dat");
        fileHandle = new RandomAccessFile("file.dat", "rw");

        if (testFile.exists() && testFile.length() >= DATASIZE) {
            System.out.println("File exists");
        } else {
            testFile.delete();
            System.out.println("Preparing file");
            byte [] buffer = new byte[BUFFSIZE];
            pos = 0;
            nDone = 0;
            while (pos < DATASIZE) {
                fileHandle.write(buffer);
                pos += buffer.length;
            }

            System.out.println("File prepared");
        } 
        fileChannel = fileHandle.getChannel();

        // mmap()
        MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, DATASIZE);
        byte [] buffer1 = new byte[BUFFSIZE];
        mbb.position(0);
        start = System.currentTimeMillis();
        pos = 0;
        while (pos < DATASIZE) {
            mbb.get(buffer1, 0, BUFFSIZE);
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = byteArrayToLong(buffer1, i);
            }
            pos += BUFFSIZE;
        }
        System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);

        // bytebuffer
        ByteBuffer buffer2 = ByteBuffer.allocateDirect(BUFFSIZE);
//        buffer2.order(ByteOrder.nativeOrder());
        buffer2.order();
        fileChannel.position(0);
        start = System.currentTimeMillis();
        pos = 0;
        nDone = 0;
        while (pos < DATASIZE) {
            buffer2.rewind();
            fileChannel.read(buffer2);
            buffer2.rewind();   // need to rewind it to be able to use it
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = buffer2.getLong();
            }
            pos += BUFFSIZE;
        }
        System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);

        // regular i/o
        fileHandle.seek(0);
        byte [] buffer3 = new byte[BUFFSIZE];
        start = System.currentTimeMillis();
        pos = 0;
        while (pos < DATASIZE && nDone != -1) {
            nDone = 0;
            while (nDone != -1  && nDone < BUFFSIZE) {
                nDone = fileHandle.read(buffer3, nDone, BUFFSIZE - nDone);
            }
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = byteArrayToLong(buffer3, i);
            }
            pos += nDone;
        }
        System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
    }
}

Reading into the direct byte buffer is faster, but getting the data out of it into the JVM is slower. 读入直接字节缓冲区的速度更快,但将数据从JVM中取出进入JVM的速度较慢。 Direct byte buffer is intended for cases where you're just copying the data without actually looking at it in the Java code. 直接字节缓冲区适用于您只是复制数据而不在Java代码中实际查看数据的情况。 Then it doesn't have to cross the native->JVM boundary at all, so it's quicker than using eg a byte[] array or a normal ByteBuffer, where the data would have to cross that boundary twice in the copy process. 然后它根本不必跨越native-> JVM边界,因此它比使用例如byte []数组或普通ByteBuffer更快,其中数据必须在复制过程中两次越过该边界。

When you have a loop which iterates more than 10,000 times it can trigger the whole method to be compiled to native code. 如果循环迭代次数超过10,000次,则可以触发整个方法编译为本机代码。 However, your later loops have not been run and cannot be optimised to the same degree. 但是,您的后续循环尚未运行,无法进行相同程度的优化。 To avoid this issue, place each loop in a different method and run again. 要避免此问题,请将每个循环放在不同的方法中并再次运行。

Additionally, you may want to set the Order for the ByteBuffer to be order(ByteOrder.nativeOrder()) to avoid all the bytes swapping around when you do a getLong and read more than 24 bytes at a time. 此外,您可能希望将ByteBuffer的Order设置为order(ByteOrder.nativeOrder()),以避免在执行getLong时一次性地交换所有字节并一次读取超过24个字节。 (As reading very small portions generates much more system calls) Try reading 32*1024 bytes at a time. (因为读取非常小的部分会产生更多的系统调用)尝试一次读取32 * 1024字节。

I wound also try getLong on the MappedByteBuffer with native byte order. 我还尝试使用本机字节顺序在MappedByteBuffer上尝试getLong This is likely to be the fastest. 这可能是最快的。

A MappedByteBuffer will always be the fastest, because the operating system associates the OS-level disk buffer with your process memory space. MappedByteBuffer始终是最快的,因为操作系统将操作系统级磁盘缓冲区与进程内存空间相关联。 Reading into an allocated direct buffer, by comparison, first loads the block into the OS buffer, then copies the contents of the OS buffer into the allocated in-process buffer. 相比之下,读入分配的直接缓冲区首先将块加载到OS缓冲区,然后将OS缓冲区的内容复制到分配的进程内缓冲区中。

Your test code also does lots of very small (24 byte) reads. 您的测试代码也会执行大量非常小的(24字节)读取。 If your actual application does the same, then you'll get an even bigger performance boost from mapping the file, because each of the reads is a separate kernel call. 如果您的实际应用程序执行相同操作,那么您将通过映射文件获得更大的性能提升,因为每个读取都是一个单独的内核调用。 You should see several times the performance by mapping. 您应该通过映射多次看到性能。

As for the direct buffer being slower than the java.io reads: you don't give any numbers, but I'd expect a slight degredation because the getLong() calls need to cross the JNI boundary. 至于直接缓冲区比java.io读取的速度慢:你没有给出任何数字,但我希望稍微降级,因为getLong()调用需要越过JNI边界。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM