简体繁体 English

性能问题：就地反转指针数组与值数组

[英]Performance question: Inverting an array of pointers in-place vs array of values

原文 2011-01-11 10:29:03 1 3 c++/ performance/ pointers/ matrix/ inversion

The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. 提出这个问题的背景是我正在求解一个线性方程组（Ax = b），其中A是一个矩阵（通常尺寸小于100x100），而x和b是向量。 I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. 我使用的是直接方法，这意味着我先求A的值，然后通过x = A ^（-1）b找到解决方案。 This step is repated in an iterative process until convergence. 在迭代过程中重复此步骤，直到收敛为止。

The way I'm doing it now, using a matrix library (MTL4): 我现在使用矩阵库（MTL4）的方式：
For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. 对于每次迭代，我将A的所有系数（值）复制到矩阵对象中，然后求逆。 This the easiest and safest option. 这是最简单，最安全的选择。

Using an array of pointers instead: 改用指针数组：
For my particular case, the coefficients of A happen to be updated between each iteration. 对于我的特定情况，A的系数恰好在每次迭代之间进行更新。 These coefficients are stored in different variables (some are arrays, some are not). 这些系数存储在不同的变量中（有些是数组，有些不是）。 Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place? 如果我将A设置为包含指向这些系数变量的指针的数组，然后将A原地求反，会不会有提高性能的潜力？

The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. 关于最后一个选项的好处是，一旦我在第一次迭代之前在A中设置了指针，就无需在连续的迭代之间复制任何值。 The values which are pointed to in A would automatically be updated between iterations. A中指向的值将在迭代之间自动更新。

So the performance question boils down to this, as I see it: 因此，正如我所看到的，性能问题可以归结为：
- The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive. -假设指针的取消引用不昂贵，则矩阵求逆过程将花费大致相同的时间。
- The array of pointers does not need the extra memory for matrix A containing values. -指针数组不需要为包含值的矩阵A提供额外的内存。
- The array of pointers option does not have to copy all NxN values of A between each iteration. -指针数组选项不必在每次迭代之间复制A的所有NxN值。
- The values that are pointed to the array of pointers option are generally NOT ordered in memory. -指向指针数组选项的值通常不在内存中排序。 Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc. 希望所有值都在内存中相对较近，但是* A [0] [1]通常不紧跟* A [0] [0]等。

Any comments to this? 对此有何评论？ Will the last remark affect performance negatively, thus weighing up for the positive performance effects? 最后一句话会否对绩效产生负面影响，从而权衡正面绩效影响？

3 个解决方案

Test, test, test. 测试，测试，测试。

Especially in the field of Numerical Linear Algebra. 特别是在数值线性代数领域。 There are many effects in play, which is why there is a number of optimized libraries that have solved that burden for you. 有许多效果在起作用，这就是为什么有许多优化的库为您解决了这一负担的原因。

Some effects to consider: 需要考虑的一些影响：

Memory locality and cache effects 内存局部性和缓存效果
Multithreading effects (some algorithms that are optimal while running single-core, cause memory collision/cache eviction when more than one core is utilized). 多线程效应（某些在运行单核时最佳的算法，当使用多个核时会导致内存冲突/缓存驱逐）。

There is no substitute for testing. 不能替代测试。

Here are some comments: 这里有一些评论：

Is the function you use for the inversion capable of handling a matrix of pointers instead of values? 用于反转的函数是否能够处理指针矩阵而不是值？ If it does not realise it has to do an indirection, all kinds of strange effects could happen. 如果它没有意识到必须进行间接操作，则可能会发生各种奇怪的影响。
When doing an in-place matrix inversion (meaning the inverted matrix overwrites the input matrix), all input coefficients will get overwritten with new values, because matrix inversion can not be done by re-ordering the elements of the matrix. 在进行就地矩阵求逆时（意味着反向矩阵会覆盖输入矩阵），所有输入系数都将被新值覆盖，因为无法通过对矩阵元素进行重新排序来进行矩阵求逆。
During the inversion process, none of the input coefficients may be changed by an outside process. 在反转过程中，任何输入系数都不会被外部过程改变。 All such updates have to be performed between iterations. 所有此类更新必须在迭代之间执行。

So, you get the following set of trade-offs when you chose the pointer solution: 因此，当您选择指针解决方案时，将获得以下权衡取舍：

The coefficients making up matrix A can no longer be calculated asynchronously with the matrix inversion. 组成矩阵A的系数不再可以与矩阵求逆异步计算。
Either all coefficients must be recalculated for each iteration (when you use in-place inversion, meaning the inverted matrix uses the same memory as the input matrix), or you still have to use a matrix of N x N values to hold the result of the inversion. 要么必须为每次迭代重新计算所有系数（当您使用就地求逆时，这意味着求逆矩阵使用与输入矩阵相同的内存），或者仍然必须使用N x N值的矩阵来保存结果。反转。

You're getting good answers here. 您在这里得到了很好的答案。 The only thing I would add is some general experience with performance. 我唯一要添加的是性能方面的一般经验。

You are thinking about performance a-priori. 您正在考虑先验性能。 That's reasonable, but the real payoff is a-posteriori. 这是合理的，但真正的收益是后验的。 In other words, you don't know for certain where the real optimization opportunities are, until the running code tells you. 换句话说，直到运行的代码告诉您时，您才能确定真正的优化机会在哪里。

You don't know if the bulk of the time will be spent in matrix inversion, multiplication, copying the matrix, dereferencing, or what. 您不知道大部分时间是否将花费在矩阵求逆，乘法，复制矩阵，解引用或什么上。 People can guess. 人们可以猜测。 If I had to guess, it would be matrix inversion, because it's 100x100. 如果我不得不猜测，那将是矩阵求逆，因为它是100x100。 However, something else I can't guess might be even bigger. 但是，我无法猜测的其他事情可能更大。 Guessing has a very poor track record, especially when you can just find out . 猜测的记录非常差，尤其是当您可以找出答案时 。