简体   繁体   English

在两个最短的时间内找出两个1d阵列之间的差异,以形成一个二维阵列

[英]Find the difference between two 1d arrays to form a 2d array of those differences in the shortest time

I am trying to find the absolute difference between every element of one array and every element of another to form a matrix. 我试图找到一个数组的每个元素和另一个数组的每个元素之间的绝对差异以形成矩阵。

I have achieved this using for loops but it is slow and I need it to be faster. 我已经使用for循环实现了它,但是它很慢,我需要更快。 I can do it faster in R for example by using the dist method but I am struggling to make it fast in C#. 例如,我可以使用dist方法在R中更快地完成它,但是我正在努力使其在C#中变得更快。

double[] array1 = new double [] { 1.1, 2.0, 3.0, 4.0, 5.0 };
double[] array2 = new double[] { 6.1, 7.0, 8.0};    
double[,] final_array = new double[5, 3];
for (int i = 0; i < 5; i++)
{
    for (j = 0; j < 3; j++)
    {
        final_array[i,j] = Math.Abs(array1[i] - array2[j])
    }
}

# expected result of final_array
5    4.1    3.1     2.1     1.1
5.9  5      4       3       2
6.9  6      5       4       3

Although this result is the correct answer I want to do this faster as I will need to do this calculation for arrays of up to 15,000 in size. 尽管此结果是正确的答案,但我想更快地执行此操作,因为我将需要对最大15,000大小的数组进行此计算。

You can use vectors in the System.Numerics namespace. 您可以在System.Numerics命名空间中使用向量。 The caveat is that it will only work with float , not with double . 需要注意的是,它仅适用于float ,不适用于double That shouldn't be a problem for subtraction though: 不过,减法应该不是问题:

float[] array1 = new float[] { 1.1F, 2.0F, 3.0F, 4.0F, 5.0F };
float[] array2 = new float[] { 6.1F, 7.0F, 8.0F };    
float[,] final_array = new float[array1.Length, array2.Length];

int vectorCount = array2.Length / 4;
Vector4[] array2Vectors = new Vector4[vectorCount];
Parallel.For(0, vectorCount, i =>
{
    int offset = i * 4;
    array2Vectors[i] = new Vector4(array2[offset], array2[offset + 1],
        array2[offset + 2], array2[offset + 3]);
});

Parallel.For(0, array1.Length, i =>
{
    Vector4 v1 = new Vector4(array1[i], array1[i], array1[i], array1[i]);
    for (int j = 0; j < array2Vectors.Length; j++)
    {
        Vector4 result = Vector4.Abs(Vector4.Subtract(v1, array2Vectors[j]));
        int offset = j * 4;
        final_array[i, offset] = result.X;
        final_array[i, offset + 1] = result.Y;
        final_array[i, offset + 2] = result.Z;
        final_array[i, offset + 3] = result.W;
    }

    for (int j = vectorCount * 4; j < array2.Length; j++)
    {
        final_array[i,j] = Math.Abs(array1[i] - array2[j]);
    }
});

Since you are using vectors now, you will make use of the CPU's SIMD instructions, which should speed up your task. 由于现在正在使用向量,因此将利用CPU的SIMD指令,这将加快您的工作速度。

Additional performance gains come from parallel execution with Parallel.For , which makes use of all available CPU cores. 通过并行执行Parallel.For可以提高性能,该并行处理利用了所有可用的CPU内核。

You can try it out here. 您可以在这里尝试。

There is no way to do this faster in terms of algorithmic complexity. 就算法复杂度而言,没有办法更快地做到这一点。 It requires exactly O(n * m) operations to calculate this result, at least because you have the resulting array of that size. 它至少需要O(n * m)运算来计算此结果,至少因为您具有该大小的结果数组。

There are some ways to slightly improve the performance of the code itself. 有一些方法可以稍微提高代码本身的性能。
The easiest one is to switch to jagged arrays, as already suggested in the comments: 最简单的一种是切换到锯齿状的数组,如注释中所建议:

double[] array1 = new double [] { 1.1, 2.0, 3.0, 4.0, 5.0 };
double[] array2 = new double[] { 6.1, 7.0, 8.0};    
double[][] final_array = new double[5][];

for (int i = 0; i < 5; i++)
{
    final_array[i] = new double[3];
    for (int j = 0; j < 3; j++)
    {
        final_array[i][j] = Math.Abs(array1[i] - array2[j]);
    }
}

You can read more about multidimensional array vs jagged arrays and their performance here: 您可以在此处阅读有关多维数组与锯齿状数组及其性能的更多信息:
What are the differences between a multidimensional array and an array of arrays in C#? 多维数组和C#中的数组数组有什么区别?

You could also go further and increase performance by using unsafe pointers to access multidimensional array or by utilizing advanced processor instructions (intrinsics), but... the question is: is this really something you need to think of? 您还可以通过使用不安全的指针访问多维数组或通过使用高级处理器指令(本征)来进一步提高性能,但是...问题是:这真的是您需要考虑的吗? Is it an only bottle neck in an extremely high-load system? 它是极高负载系统中的唯一瓶颈吗? If it is not, then just leave your code as it is, in a clearly readable and understandable form. 如果不是,则以清晰易读的形式保留您的代码。 Saying about performance, O(n * m) asymptotic complexity is perfectly fine for arrays of size 15000. 说到性能,对于大小为15000的数组, O(n * m)渐近复杂度非常好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM