简体   繁体   English

在C#中快速访问矩阵作为锯齿状数组

[英]Fast access to matrix as jagged array in C#

I've created a lower triangular distance matrix (because of size issues) as jagged array Note: Distances between objects are symmetric 我创建了一个较低的三角形距离矩阵(由于大小问题)作为锯齿状数组注意:对象之间的距离是对称的

var dm = new double[size][]
for (var i = 0; i < size; i++)
{
   dm[i] = new double[i+1];
   for (var j = 0; j < i+1; j++)
   {
      dm[i][j] = distance(data[i], data[j]);
   }
 }

I need to access this matrix very often so I made the following method for it 我需要经常访问此矩阵,因此我对其进行了以下方法

private double GetValueOfDM(int row, int column, double[][] dm)
{
    return column <= row ? distanceMatrix[row][column] : distanceMatrix[column][row];
}

With the Visual Studio performance analysis one sees the major speed issue lies in the only row of the GetValueOfDM method. 通过Visual Studio性能分析,可以看到主要的速度问题在于GetValueOfDM方法的仅一行。

Has someone a good idea how to speed this up? 有人知道如何加快速度吗?

I'm guessing you're using this in a tight-loop? 我猜您是在紧密循环中使用它吗? Arrays in .NET aren't /that/ fast because of automatic bounds-checking. 由于自动边界检查,.NET中的数组不是/ that /快速的。 If you need fast array perf use a pointer with a buffer: 如果需要快速数组性能,请使用带有缓冲区的指针:

sealed unsafe class DistanceData : IDisposable {
    private Double* buffer;
    private IntPtr  bufferLength; // .NET uses IntPtr as a size_t equivalent.
    private Int32   dim0Length;

    public DistanceData(Int32 size, Double[] data) {
        this.buffer       = (Double*)Marshal.AllocHGlobal( size * size );
        this.bufferLength = size * size;
        this.dim0Length   = size;

        for(int y = 0; y < size; y++) {
            for(int x = 0; x < y + 1; x++) {
                this.buffer[ y * this.dim0Length + x ] = Distance( data[y], data[x] );
            }
        }
    }

    public void Dispose() {
        Marshal.FreeHGlobal( this.buffer );
    }

    public Double GetValueOfDM(Int32 row, Int32 column) {
        // WARNING: Without validation or your own bounds-checking, invalid values of `row` and `column` will cause access-violation errors and crash your program. Ensure that code that calls `GetValueOfDM` is correct and will never submit invalid values.
        return this.buffer[ row * this.dim0Length  + column];
    }
}

You could remove the conditional in the method and increase memory usage to increase access performance like so: 您可以删除方法中的条件并增加内存使用量,以提高访问性能,如下所示:

var dm = new double[size][];
for (var i = 0; i < size; i++)
{
   dm[i] = new double[size];
   for (var j = 0; j < i+1; j++)
   {
      dm[i][j] = distance(data[i], data[j]);
      dm[j][i] = dm[i][j];
   }
 }

private double GetValueOfDM(int row, int column, double[][] dm)
{
    return dm[row][column];
}

Now that you don't have a conditional, the compiler can remove a branch prediction. 现在您没有条件了,编译器可以删除分支预测。 Also, you should run tests with your actual use cases to ensure that it is actually going to be a problem. 另外,您应该使用实际用例进行测试,以确保这实际上将是一个问题。 Analysis would probably reveal that a branching conditional will be the slowest part of your code, but it doesn't necessarily mean that it's actually going to slow anything down noticeably. 分析可能会发现分支条件将是代码中最慢的部分,但这并不一定意味着它实际上会显着降低任何速度。 In addition, you could try running it in Release mode (with compiler optimizations) to see how it affects performance. 此外,您可以尝试在“发布”模式下运行它(使用编译器优化),以查看它如何影响性能。

If you are on a system where you don't have the memory available to double the size of the array, then the code you have is probably close to optimal for accessing a jagged array. 如果您所在的系统上没有可用的内存来使数组大小增加一倍,则您拥有的代码可能接近访问交错数组的最佳状态。

You could use a one-dimensional array and calculate the index like this 您可以使用一维数组并像这样计算索引

i = (r * r + r) / 2 + c;

But you still have to check for r <= c and do the flipping. 但是您仍然必须检查r <= c并进行翻转。 (r=row, c=column) (r =行,c =列)

But will this really be faster? 但这真的会更快吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM