简体   繁体   English

在Java中高效实现多维数组?

[英]Efficient implementation of multi-dimensional arrays in Java?

As far as I understand (from answers such as this ), java has no native multi-dimensional continuous memory arrays ( unlike C#, for example ). 据我理解(从答案如 ),Java没有天然多维连续存储器阵列( 不同于C#,例如 )。

While the jagged array syntax (arrays of arrays) might be good for most applications, I would still like to know what's the best practice if you do want the raw efficiency of a continuous-memory array (avoiding unneeded memory reads) 虽然锯齿状数组语法(数组数组)可能对大多数应用程序都有好处,但我仍然想知道如果你想要连续内存数组的原始效率(避免不必要的内存读取),最佳做法是什么

I could of course use a single-dimensional array that maps to a 2D one, but I prefer something more structured. 我当然可以使用映射到2D的单维数组,但我更喜欢更结构化的东西。

it's not difficult to do it manually: 手动操作并不困难:

int[] matrix = new int[ROWS * COLS];

int x_i_j = matrix[ i*COLS + j ];

now, is it really faster than java's multi dimension array? 现在,它真的比java的多维数组快吗?

int x_i_j = matrix[i][j];

for random access, maybe. 对于随机访问,也许。 for continuous access, probably not - matrix[i] is almost certainly in L1 cache, if not in register cache. 对于连续访问,可能不是 - matrix[i]几乎肯定在L1缓存中,如果不在寄存器缓存中。 in best scenario, matrix[i][j] requires one addition and one memory read; 在最好的情况下, matrix[i][j]需要一次加法和一次内存读取; while matrix[i*COLS + j] may cost 2 additions, one multiply, one memory read. matrix[i*COLS + j]可能需要花费2次加法,一次乘法,一次读取。 but who's counting? 但谁在数呢?

It depends on your access pattern. 这取决于您的访问模式。 Using this simple program , comparing an int[][] with a 2D mapped over a 1D int[] array treated as a matrix, a native Java 2D matrix is: 使用这个简单的程序 ,将int[][]与作为矩阵处理的1D int[]数组上的2D映射进行比较,本机Java 2D矩阵是:

  1. 25% faster when the row is on the cache, ie: accessing by rows: 当行在缓存上时,速度提高25%,即:按行访问:
  2. 100% slower when the row is not in the cache, ie: accessing by colums: 当行不在缓存中时,速度减慢100%,即:通过列访问:

ie: 即:

// Case #1
for (y = 0; y < h; y++)
    for (x = 0; x < w; x++)
        // Access item[y][x]

// Case #2
for (x = 0; x < w; x++)
    for (y = 0; y < h; y++)
        // Access item[y][x]

The 1D matrix is calculated as: 1D矩阵计算如下:

public int get(int x, int y) {
    return this.m[y * width + x];
}

If you really want more structure with a continuous-memory array, wrap it in an object. 如果你真的想要一个带有连续内存数组的更多结构,请将它包装在一个对象中。

public class My2dArray<T> {

  int sizeX;

  private T[] items;

  public My2dArray(int x, int y) {
    sizeX = x;
    items = new T[x*y];
  }

  public T elementAt(int x, int y) {
    return items[x+y*sizeX];
  }

}

Not a perfect solution, and you probably already know it. 不是一个完美的解决方案,你可能已经知道了。 So consider this confirmation of what you suspected to be true. 因此,请考虑确认您怀疑是真实的。

Java only provides certain constructs for organizing code, so eventually you'll have to reach for a class or interface. Java仅提供用于组织代码的某些构造,因此最终您必须达到类或接口。 Since this also requires specific operations, you need a class. 由于这也需要特定的操作,因此您需要一个类。

The performance impacts include creating a JVM stack frame for each array access, and it would be ideal to avoid such a thing; 性能影响包括为每个阵列访问创建一个JVM堆栈帧,并且理想的是避免这样的事情; however, a JVM stack frame is how the JVM implements it's scoping. 但是,JVM堆栈框架是JVM 如何实现它的范围。 Code organization requires appropriate scoping, so there's not really a way around that performance hit that I can imagine (without violating the spirit of "everything is an object"). 代码组织需要适当的范围,因此我无法想象这种性能影响(不违反“一切都是对象”的精神)。

The most efficient method of implementing multi-dimensional arrays is by utilizing one-dimensional arrays as multi-dimensional arrays. 实现多维数组的最有效方法是将一维数组用作多维数组。 See this answer about mapping a 2D array into a 1D array. 请参阅有关将2D阵列映射到1D阵列的答案

// 2D data structure as 1D array
int[] array = new int[width * height];
// access the array 
array[x + y * width] = /*value*/;

I could of course use a single-dimensional array that maps to a 2D one, but I prefer something more structured. 我当然可以使用映射到2D的单维数组,但我更喜欢更结构化的东西。

If you want to access array in a more structured manner, create a class for it: 如果要以更结构化的方式访问array ,请为其创建一个类:

public class ArrayInt {

    private final int[] array;
    private final int width, height;

    public ArrayInt(int width, int height) {
        array = new int[width * height];
        this.width = width;
        this.height = height;
    }

    public int getWidth() {
        return width;
    }

    public int getHeight() {
        return height;
    }

    public int get(int x, int y) {
        return array[x + y * width];
    }

    public void set(int x, int y, int value) {
        array[x + y * width] = value;
    }

}

If you wanted arrays of objects, you could use generics and define class Array<T> , where T is the object stored in the array. 如果你想要对象数组,你可以使用泛型并定义类Array<T> ,其中T是存储在数组中的对象。

Performance-wise, this will, in most cases, be faster than a multi-dimensional array in Java. 在性能方面,在大多数情况下,这将比Java中的多维数组更快。 The reasons can be found in the answers to this question . 原因可以在这个问题的答案中找到。

Let's say you have a 2D array int[][] a = new int[height][width] , so by convention you have the indices a[y][x] . 假设你有一个二维数组int[][] a = new int[height][width] ,所以按照惯例你有索引a[y][x] Depending on how you represent the data and how you access them, the performance varies in a factor of 20 : 根据您表示数据的方式以及访问方式,性能的变化范围为20:

二维阵列访问的比较

The code: 编码:

public class ObjectArrayPerformance {
    public int width;
    public int height;
    public int m[];

    public ObjectArrayPerformance(int w, int h) {
            this.width = w;
            this.height = h;
            this.m = new int[w * h];
    }

    public int get(int x, int y) {
            return this.m[y * width + x];
    }

    public void set(int x, int y, int value) {
            this.m[y * width + x] = value;
    }

    public static void main (String[] args) {
            int w = 1000, h = 2000, passes = 400;

            int matrix[][] = new int[h][];

            for (int i = 0; i < h; ++i) {
                    matrix[i] = new int[w];
            }

            long start;
            long duration;

            System.out.println("duration[ms]\tmethod");

            start = System.currentTimeMillis();
            for (int z = 0; z < passes; z++) {
                    for (int y = 0; y < h; y++) {
                        for (int x = 0; x < w; x++) {
                                    matrix[y][x] = matrix[y][x] + 1;
                            }
                    }
            }
            duration = System.currentTimeMillis() - start;
            System.out.println(duration+"\t2D array, loop on x then y");

            start = System.currentTimeMillis();
            for (int z = 0; z < passes; z++) {
                    for (int x = 0; x < w; x++) {
                            for (int y = 0; y < h; y++) {
                                    matrix[y][x] = matrix[y][x] + 1;
                            }
                    }
            }
            duration = System.currentTimeMillis() - start;
            System.out.println(duration+"\t2D array, loop on y then x");

            //

            ObjectArrayPerformance mt = new ObjectArrayPerformance(w, h);
            start = System.currentTimeMillis();
            for (int z = 0; z < passes; z++) {
                    for (int x = 0; x < w; x++) {
                            for (int y = 0; y < h; y++) {
                                    mt.set(x, y, mt.get(x, y) + 1);
                            }
                    }
            }
            duration = System.currentTimeMillis() - start;
            System.out.println(duration+"\tmapped 1D array, access trough getter/setter");

            //

            ObjectArrayPerformance mt2 = new ObjectArrayPerformance(w, h);
            start = System.currentTimeMillis();
            for (int z = 0; z < passes; z++) {
                    for (int x = 0; x < w; x++) {
                            for (int y = 0; y < h; y++) {
                                    mt2.m[y * w + x] = mt2.m[y * w + x] + 1;
                            }
                    }
            }
            duration = System.currentTimeMillis() - start;
            System.out.println(duration+"\tmapped 1D array, access through computed indexes, loop y then x");

            ObjectArrayPerformance mt3 = new ObjectArrayPerformance(w, h);
            start = System.currentTimeMillis();
            for (int z = 0; z < passes; z++) {
                    for (int y = 0; y < h; y++) {
                        for (int x = 0; x < w; x++) {
                                    mt3.m[y * w + x] = mt3.m[y * w + x] + 1;
                            }
                    }
            }
            duration = System.currentTimeMillis() - start;
            System.out.println(duration+"\tmapped 1D array, access through computed indexes, loop x then y");

            ObjectArrayPerformance mt4 = new ObjectArrayPerformance(w, h);
            start = System.currentTimeMillis();
            for (int z = 0; z < passes; z++) {
                    for (int y = 0; y < h; y++) {
                        int yIndex = y * w;
                        for (int x = 0; x < w; x++) {
                                    mt4.m[yIndex + x] = mt4.m[yIndex + x] + 1;
                            }
                    }
            }
            duration = System.currentTimeMillis() - start;
            System.out.println(duration+"\tmapped 1D array, access through computed indexes, loop x then y, yIndex optimized");
    }
}

We can conclude that linear access performance depends more on the way you process the array (lines then columns or the reverse?: performance gain = x10, much due to CPU caches) than the structure of the array itself (1D vs 2D : performance gain = x2). 我们可以得出结论,线性访问性能更多地取决于您处理数组的方式(行然后是列或反向?:性能增益= x10,很多是由于CPU缓存)而不是数组本身的结构(1D与2D:性能增益) = x2)。

If random access, the performance differences should be much lower, because the CPU caches have less effect. 如果是随机访问,性能差异应该低得多,因为CPU缓存效果较差。

Sample implementation, without a compiler. 示例实现,没有编译器。 This is basically what C/C++ do behind the scenes when you access multidimensional arrays. 这基本上是当您访问多维数组时C / C ++在幕后所做的事情。 You'll have to further define accessor behaviour when less than the actual dimensions are specified & so on. 当小于指定的实际尺寸时,您将不得不进一步定义访问者行为,依此类推。 Overhead will be minimal and could be optimized further, but thats microoptimizing imho. 开销将是最小的,可以进一步优化,但这是微优化imho。 Also, you never actually know what goes on under the hood after JIT kicks in. 而且,在JIT开始之后,你永远不会知道引擎盖下发生了什么。

class MultiDimentionalArray<T> {
//disclaimer: written within SO editor, might contain errors
    private T[] data;
    private int[] dimensions; //holds each dimensions' size

    public MultiDimensionalArray(int... dims) {
        dimensions = Arrays.copyOf(dims, dims.length);
        int size = 1;
        for(int dim : dims)
            size *= dim;
        data = new T[size];
    }

   public T access(int... dims) {
       int idx = 1;
       for(int i = 0; i < dims.length)
            idx += dims[i] * dimensions[i]; //size * offset
       return data[idx];
    }
}

If you cannot live without C constructs, there's always JNI. 如果你不能没有C构造,那么总是有JNI。

Or you could develop your own Java-derived language (and VM and optimizing JIT compiler) that has a syntax for multidimensional continuous-memory arrays. 或者,您可以开发自己的Java派生语言(以及VM和优化JIT编译器),该语言具有多维连续内存数组的语法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM