如何在Java中有效地存儲小字節數組？

Question

小字節數組是指字節數組，長度從10到30。

通過商店我的意思是將它們存儲在RAM中 ，而不是序列化並持久保存到文件系統。

系統macOS 10.12.6，Oracle jdk1.8.0_141 64位，JVM args -Xmx1g

示例： new byte[200 * 1024 * 1024]預期行為是堆空間的≈200mb

public static final int TARGET_SIZE = 200 * 1024 * 1024;
public static void main(String[] args) throws InterruptedException {
    byte[] arr = new byte[TARGET_SIZE];
    System.gc();
    System.out.println("Array size: " + arr.length);
    System.out.println("HeapSize: " + Runtime.getRuntime().totalMemory());
    Thread.sleep(60000);
}

但是對於較小的數組，數學並不那么簡單

public static final int TARGET_SIZE = 200 * 1024 * 1024;
public static void main(String[] args) throws InterruptedException {
    final int oneArraySize = 20;
    final int numberOfArrays = TARGET_SIZE / oneArraySize;
    byte[][] arrays = new byte[numberOfArrays][];
    for (int i = 0; i < numberOfArrays; i++) {
        arrays[i] = new byte[oneArraySize];
    }
    System.gc();
    System.out.println("Arrays size: " + arrays.length);
    System.out.println("HeapSize: " + Runtime.getRuntime().totalMemory());
    Thread.sleep(60000);
}

更糟糕的是

問題是

~~這個開銷來自哪里？~~ 如何有效地存儲和使用小字節數組（數據塊）？

更新1

對於new byte[200*1024*1024][1]它吃

基本數學表示new byte[1] 權重為 24個字節。

更新2

根據Java中對象的內存消耗是多少？ Java中對象的最小大小為16個字節 。 從我以前的“測量” 24字節-4字節為int長度-1我的數據的實際字節=一些~~其他垃圾~~填充的3字節。

Answer 1

好的，所以如果我理解正確（請問如果沒有 - 將嘗試回答），這里有幾件事。 首先，您需要正確的測量工具， JOL是我唯一信任的工具。

讓我們開始吧：

byte[] two = new byte[1];
System.out.println(GraphLayout.parseInstance(one).toFootprint());

這將顯示24 bytes （ 12用於mark和class字 - 或者Object頭+ 4個字節填充）， 1 byte用於實際值， 7 bytes for padding （內存是8個字節對齊）。

考慮到這一點，這應該是一個可預測的輸出：

byte[] eight = new byte[8];
System.out.println(GraphLayout.parseInstance(eight).toFootprint()); // 24 bytes

byte[] nine = new byte[9];
System.out.println(GraphLayout.parseInstance(nine).toFootprint()); // 32 bytes

現在讓我們轉到二維數組：

byte[][] ninenine = new byte[9][9];    
System.out.println(GraphLayout.parseInstance(ninenine).toFootprint()); // 344 bytes

System.out.println(ClassLayout.parseInstance(ninenine).toPrintable());

因為java沒有真正的二維數組; 每個嵌套數組本身都是一個具有標題和內容的Object（ byte[] ）。 因此，單個byte[9]具有32 bytes （ 12標頭+ 4填充）和16 bytes用於內容（ 9 bytes用於實際內容+ 7 bytes填充）。

ninenine對象總共有56個字節： 16標題+ 36用於保持對9個對象的引用+ 4 bytes用於填充。

看看這里生產的樣本：

byte[][] left = new byte[10000][10];
System.out.println(GraphLayout.parseInstance(left).toFootprint()); // 360016 bytes

byte[][] right = new byte[10][10000];
System.out.println(GraphLayout.parseInstance(right).toFootprint()); // 100216 bytes

這是260％的增長; 所以只需改變工作方式就可以節省大量空間。

但更深層次的問題是，在Java中的每一個對象都有這些頭，有沒有頭部信息的對象呢。 它們可能會出現並稱為值類型。 可能是在實現時 - 原語數組至少不會有這種開銷。

Answer 2

Eugene的答案解釋了為什么你觀察到大量陣列的內存消耗增加的原因。 標題中的問題“如何在Java中有效地存儲小字節數組？” ，然后可以回答：完全沒有。 ¹

但是，可能有辦法實現您的目標。 像往常一樣，“最好”這里的解決方案將取決於該數據將如何使用。 一種非常實用的方法是：為您的數據結構定義一個interface 。

在最簡單的情況下，這個界面可能就是這樣

interface ByteArray2D 
{
    int getNumRows();
    int getNumColumns();
    byte get(int r, int c);
    void set(int r, int c, byte b);
}

提供“2D字節數組”的基本抽象。 根據應用案例，在此提供其他方法可能是有益的。 這里可以使用的模式通常與處理“2D矩陣”（通常是float值）的Matrix庫相關，並且它們通常提供如下方法：

interface Matrix {
    Vector getRow(int row);
    Vector getColumn(int column);
    ...
}

但是，當這里的主要目的是處理一組byte[]數組時，訪問每個數組 （即2D數組的每一行）的方法就足夠了：

ByteBuffer getRow(int row);

有了這個接口，創建不同的實現很簡單。 例如，您可以創建一個只在內部存儲2D byte[][]數組的簡單實現：

class SimpleByteArray2D implements ByteArray2D 
{
    private final byte array[][];
    ...
}

或者，您可以在內部創建一個存儲1D byte[]數組或類似ByteBuffer ：

class CompactByteArray2D implements ByteArray2D
{
    private final ByteBuffer buffer;
    ...
}

然后，該實現只需在調用訪問2D陣列的某個行/列的方法之一時計算（1D）索引。

下面是一個MCVE ，它顯示了這個接口和兩個實現，接口的基本用法，以及使用JOL進行內存占用分析。

該程序的輸出是：

For 10 rows and 1000 columns:
Total size for SimpleByteArray2D : 10240
Total size for CompactByteArray2D: 10088

For 100 rows and 100 columns:
Total size for SimpleByteArray2D : 12440
Total size for CompactByteArray2D: 10088

For 1000 rows and 10 columns:
Total size for SimpleByteArray2D : 36040
Total size for CompactByteArray2D: 10088

顯示出來

基於簡單的2D byte[][]數組的SimpleByteArray2D實現在行數增加時需要更多內存（即使數組的總大小保持不變）
CompactByteArray2D的內存消耗與數組的結構無關

整個計划：

package stackoverflow;

import java.nio.ByteBuffer;

import org.openjdk.jol.info.GraphLayout;

public class EfficientByteArrayStorage
{
    public static void main(String[] args)
    {
        showExampleUsage();
        anaylyzeMemoryFootprint();
    }

    private static void anaylyzeMemoryFootprint()
    {
        testMemoryFootprint(10, 1000);
        testMemoryFootprint(100, 100);
        testMemoryFootprint(1000, 10);
    }

    private static void testMemoryFootprint(int rows, int cols)
    {
        System.out.println("For " + rows + " rows and " + cols + " columns:");

        ByteArray2D b0 = new SimpleByteArray2D(rows, cols);
        GraphLayout g0 = GraphLayout.parseInstance(b0);
        System.out.println("Total size for SimpleByteArray2D : " + g0.totalSize());
        //System.out.println(g0.toFootprint());

        ByteArray2D b1 = new CompactByteArray2D(rows, cols);
        GraphLayout g1 = GraphLayout.parseInstance(b1);
        System.out.println("Total size for CompactByteArray2D: " + g1.totalSize());
        //System.out.println(g1.toFootprint());
    }

    // Shows an example of how to use the different implementations
    private static void showExampleUsage()
    {
        System.out.println("Using a SimpleByteArray2D");
        ByteArray2D b0 = new SimpleByteArray2D(10, 10);
        exampleUsage(b0);

        System.out.println("Using a CompactByteArray2D");
        ByteArray2D b1 = new CompactByteArray2D(10, 10);
        exampleUsage(b1);
    }

    private static void exampleUsage(ByteArray2D byteArray2D)
    {
        // Reading elements of the array
        System.out.println(byteArray2D.get(2, 4));

        // Writing elements of the array
        byteArray2D.set(2, 4, (byte)123);
        System.out.println(byteArray2D.get(2, 4));

        // Bulk access to rows
        ByteBuffer row = byteArray2D.getRow(2);
        for (int c = 0; c < row.capacity(); c++)
        {
            System.out.println(row.get(c));
        }

        // (Commented out for this MCVE: Writing one row to a file)
        /*/
        try (FileChannel fileChannel = 
            new FileOutputStream(new File("example.dat")).getChannel())
        {
            fileChannel.write(byteArray2D.getRow(2));
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
        //*/
    }

}


interface ByteArray2D 
{
    int getNumRows();
    int getNumColumns();
    byte get(int r, int c);
    void set(int r, int c, byte b);

    // Bulk access to rows, for convenience and efficiency
    ByteBuffer getRow(int row);
}

class SimpleByteArray2D implements ByteArray2D 
{
    private final int rows;
    private final int cols;
    private final byte array[][];

    public SimpleByteArray2D(int rows, int cols)
    {
        this.rows = rows;
        this.cols = cols;
        this.array = new byte[rows][cols];
    }

    @Override
    public int getNumRows()
    {
        return rows;
    }

    @Override
    public int getNumColumns()
    {
        return cols;
    }

    @Override
    public byte get(int r, int c)
    {
        return array[r][c];
    }

    @Override
    public void set(int r, int c, byte b)
    {
        array[r][c] = b;
    }

    @Override
    public ByteBuffer getRow(int row)
    {
        return ByteBuffer.wrap(array[row]);
    }
}

class CompactByteArray2D implements ByteArray2D
{
    private final int rows;
    private final int cols;
    private final ByteBuffer buffer;

    public CompactByteArray2D(int rows, int cols)
    {
        this.rows = rows;
        this.cols = cols;
        this.buffer = ByteBuffer.allocate(rows * cols);
    }

    @Override
    public int getNumRows()
    {
        return rows;
    }

    @Override
    public int getNumColumns()
    {
        return cols;
    }

    @Override
    public byte get(int r, int c)
    {
        return buffer.get(r * cols + c);
    }

    @Override
    public void set(int r, int c, byte b)
    {
        buffer.put(r * cols + c, b);
    }

    @Override
    public ByteBuffer getRow(int row)
    {
        ByteBuffer r = buffer.slice();
        r.position(row * cols);
        r.limit(row * cols + cols);
        return r.slice();
    }
}

同樣，這主要是作為草圖，以顯示一種可能的方法。 界面的細節將取決於預期的應用模式。

¹旁注：

內存開銷的問題在其他語言中是類似的。 例如，在C / C ++中，最接近“2D Java數組”的結構將是手動分配的指針數組：

char** array;
array = new (char*)[numRows];
array[0] = new char[numCols];
...

在這種情況下，您還有一個與行數成比例的開銷 - 即每行一個（通常是4個字節）指針。

如何在Java中有效地存儲小字節數組？

問題描述

但是對於較小的數組，數學並不那么簡單

更糟糕的是

問題是

更新1

更新2

2 個解決方案

解決方案1
9 2017-08-23 12:46:02

解決方案2
3 已采納 2017-08-23 19:00:35

如何在Java中有效地存儲小字節數組？

問題描述

但是對於較小的數組，數學並不那么簡單

更糟糕的是

問題是

更新1

更新2

2 個解決方案

解決方案1 9 2017-08-23 12:46:02

解決方案2 3 已采納 2017-08-23 19:00:35

解決方案1
9 2017-08-23 12:46:02

解決方案2
3 已采納 2017-08-23 19:00:35