從二進制文件中讀取大量int的最快方法

Question

我在嵌入式Linux設備上使用Java 1.5，並希望讀取具有2MB int值的二進制文件。 （現在4字節Big Endian，但我可以決定，格式）

通過BufferedInputStream使用DataInputStream使用dis.readInt() ），這500,000個調用需要17秒才能讀取，但讀入一個大字節緩沖區的文件需要5秒。

我怎樣才能更快地將該文件讀入一個巨大的int []？

讀取過程不應超過512 kb。

以下使用nio代碼並不比java io的readInt（）方法快。

    // asume I already know that there are now 500 000 int to read:
    int numInts = 500000;
    // here I want the result into
    int[] result = new int[numInts];
    int cnt = 0;

    RandomAccessFile aFile = new RandomAccessFile("filename", "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buf = ByteBuffer.allocate(512 * 1024);

    int bytesRead = inChannel.read(buf); //read into buffer.

    while (bytesRead != -1) {

      buf.flip();  //make buffer ready for get()

      while(buf.hasRemaining() && cnt < numInts){
       // probably slow here since called 500 000 times
          result[cnt] = buf.getInt();
          cnt++;
      }

      buf.clear(); //make buffer ready for writing
      bytesRead = inChannel.read(buf);
    }


    aFile.close();
    inChannel.close();

更新：評估答案：

在PC上，使用IntBuffer方法的Memory Map是我設置中最快的。
在嵌入式設備上，沒有jit，java.io DataiInputStream.readInt（）有點快（17s，與使用IntBuffer的MemMap相比，20s）

最后的結論：通過算法更改可以更輕松地實現顯着的加速。 （初始化文件較小）

Answer 1

我不知道這是否會比亞歷山大提供的更快，但你可以嘗試映射文件。

    try (FileInputStream stream = new FileInputStream(filename)) {
        FileChannel inChannel = stream.getChannel();

        ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        int[] result = new int[500000];

        buffer.order( ByteOrder.BIG_ENDIAN );
        IntBuffer intBuffer = buffer.asIntBuffer( );
        intBuffer.get(result);
    }

Answer 2

你可以使用nio包中的IntBuffer - > http://docs.oracle.com/javase/6/docs/api/java/nio/IntBuffer.html

int[] intArray = new int[ 5000000 ];

IntBuffer intBuffer = IntBuffer.wrap( intArray );

...

通過調用inChannel.read(intBuffer)來填充緩沖區。

緩沖區已滿后， intArray將包含500000個整數。

編輯

在意識到Channels只支持ByteBuffer 。

// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];

// 4 bytes per int, direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 );

// BIG_ENDIAN byte order
buf.order( ByteOrder.BIG_ENDIAN );

// Fill in the buffer
while ( buf.hasRemaining( ) )
{
   // Per EJP's suggestion check EOF condition
   if( inChannel.read( buf ) == -1 )
   {
       // Hit EOF
       throw new EOFException( );
   }
}

buf.flip( );

// Create IntBuffer view
IntBuffer intBuffer = buf.asIntBuffer( );

// result will now contain all ints read from file
intBuffer.get( result );

Answer 3

我使用序列化/反序列化，DataInputStream和ObjectInputStream進行了相當仔細的實驗，兩者都基於ByteArrayInputStream以避免IO效應。 對於一百萬個整數，readObject約為20毫秒，readInt約為116.百萬個int數組的序列化開銷為27個字節。 這是2013年的MacBook Pro。

話雖如此，對象序列化有點邪惡，你必須用Java程序寫出數據。

從二進制文件中讀取大量int的最快方法

問題描述

3 個解決方案

解決方案1
4 已采納 2013-04-15 20:58:53

解決方案2
3 2013-04-15 18:20:11

解決方案3
2 2014-12-20 06:01:18

從二進制文件中讀取大量int的最快方法

問題描述

3 個解決方案

解決方案1 4 已采納 2013-04-15 20:58:53

解決方案2 3 2013-04-15 18:20:11

解決方案3 2 2014-12-20 06:01:18

解決方案1
4 已采納 2013-04-15 20:58:53

解決方案2
3 2013-04-15 18:20:11

解決方案3
2 2014-12-20 06:01:18