简体   繁体   English

C-矩阵是否按值传递?

[英]C - Matrices as pass by value?

I'm designing matrix handling functions for a C project. 我正在为C项目设计矩阵处理函数。 I am considering either passing matrices by value or by reference. 我正在考虑通过值或通过引用传递矩阵。 I created a benchmark passing matrices by value and by reference, and both appear to perform the same with optimization flag -O0 and -O2 in gcc. 我创建了一个按值和按引用传递基准的基准,并且两者在gcc中的优化标记-O0和-O2上似乎表现相同。 Given that my benchmark may be giving incorrect results, I would like to know what is the most efficient way to pass matrices in and out of function calls using only C. 鉴于我的基准测试可能给出不正确的结果,我想知道什么是仅使用C将矩阵传入和传出函数调用的最有效方法。

#include <stdio.h>
#include <time.h>

// Compiled on OSX 10.6.8 using: cc -o matrix matrix.c -std=c99 -O2

typedef struct {
    float m0;
    float m1;
    float m2;
    float m3;
    float m4;
    float m5;
    float m6;
    float m7;
    float m8;
    float m9;
    float m10;
    float m11;
    float m12;
    float m13;
    float m14;
    float m15;
} Matrix;

// ================================================
//                 Pass By Value
// ------------------------------------------------

Matrix PassByValue (Matrix a, Matrix b) {
    Matrix matrix;

    matrix.m0  = a.m0 * b.m0  + a.m4 * b.m1  + a.m8  * b.m2  + a.m12 * b.m3;
    matrix.m1  = a.m1 * b.m0  + a.m5 * b.m1  + a.m9  * b.m2  + a.m13 * b.m3;
    matrix.m2  = a.m2 * b.m0  + a.m6 * b.m1  + a.m10 * b.m2  + a.m14 * b.m3;
    matrix.m3  = a.m3 * b.m0  + a.m7 * b.m1  + a.m11 * b.m2  + a.m15 * b.m3;

    matrix.m4  = a.m0 * b.m4  + a.m4 * b.m5  + a.m8  * b.m6  + a.m12 * b.m7;
    matrix.m5  = a.m1 * b.m4  + a.m5 * b.m5  + a.m9  * b.m6  + a.m13 * b.m7;
    matrix.m6  = a.m2 * b.m4  + a.m6 * b.m5  + a.m10 * b.m6  + a.m14 * b.m7;
    matrix.m7  = a.m3 * b.m4  + a.m7 * b.m5  + a.m11 * b.m6  + a.m15 * b.m7;

    matrix.m8  = a.m0 * b.m8  + a.m4 * b.m9  + a.m8  * b.m10 + a.m12 * b.m11;
    matrix.m9  = a.m1 * b.m8  + a.m5 * b.m9  + a.m9  * b.m10 + a.m13 * b.m11;
    matrix.m10 = a.m2 * b.m8  + a.m6 * b.m9  + a.m10 * b.m10 + a.m14 * b.m11;
    matrix.m11 = a.m3 * b.m8  + a.m7 * b.m9  + a.m11 * b.m10 + a.m15 * b.m11;

    matrix.m12 = a.m0 * b.m12 + a.m4 * b.m13 + a.m8  * b.m14 + a.m12 * b.m15;
    matrix.m13 = a.m1 * b.m12 + a.m5 * b.m13 + a.m9  * b.m14 + a.m13 * b.m15;
    matrix.m14 = a.m2 * b.m12 + a.m6 * b.m13 + a.m10 * b.m14 + a.m14 * b.m15;
    matrix.m15 = a.m3 * b.m12 + a.m7 * b.m13 + a.m11 * b.m14 + a.m15 * b.m15;

    return matrix;
}


// ================================================
//               Pass By Reference
// ------------------------------------------------

void PassByReference (Matrix* matrix, Matrix* a, Matrix* b) {
    if (!matrix) return;
    if (!a) return;
    if (!b) return;

    matrix->m0  = a->m0 * b->m0  + a->m4 * b->m1  + a->m8  * b->m2  + a->m12 * b->m3;
    matrix->m1  = a->m1 * b->m0  + a->m5 * b->m1  + a->m9  * b->m2  + a->m13 * b->m3;
    matrix->m2  = a->m2 * b->m0  + a->m6 * b->m1  + a->m10 * b->m2  + a->m14 * b->m3;
    matrix->m3  = a->m3 * b->m0  + a->m7 * b->m1  + a->m11 * b->m2  + a->m15 * b->m3;

    matrix->m4  = a->m0 * b->m4  + a->m4 * b->m5  + a->m8  * b->m6  + a->m12 * b->m7;
    matrix->m5  = a->m1 * b->m4  + a->m5 * b->m5  + a->m9  * b->m6  + a->m13 * b->m7;
    matrix->m6  = a->m2 * b->m4  + a->m6 * b->m5  + a->m10 * b->m6  + a->m14 * b->m7;
    matrix->m7  = a->m3 * b->m4  + a->m7 * b->m5  + a->m11 * b->m6  + a->m15 * b->m7;

    matrix->m8  = a->m0 * b->m8  + a->m4 * b->m9  + a->m8  * b->m10 + a->m12 * b->m11;
    matrix->m9  = a->m1 * b->m8  + a->m5 * b->m9  + a->m9  * b->m10 + a->m13 * b->m11;
    matrix->m10 = a->m2 * b->m8  + a->m6 * b->m9  + a->m10 * b->m10 + a->m14 * b->m11;
    matrix->m11 = a->m3 * b->m8  + a->m7 * b->m9  + a->m11 * b->m10 + a->m15 * b->m11;

    matrix->m12 = a->m0 * b->m12 + a->m4 * b->m13 + a->m8  * b->m14 + a->m12 * b->m15;
    matrix->m13 = a->m1 * b->m12 + a->m5 * b->m13 + a->m9  * b->m14 + a->m13 * b->m15;
    matrix->m14 = a->m2 * b->m12 + a->m6 * b->m13 + a->m10 * b->m14 + a->m14 * b->m15;
    matrix->m15 = a->m3 * b->m12 + a->m7 * b->m13 + a->m11 * b->m14 + a->m15 * b->m15;
}

// ================================================
//                  Benchmark
// ------------------------------------------------

#define LOOPS 100000

int main () {
    Matrix result;
    Matrix a;
    Matrix b;
    clock_t begin;
    clock_t end;
    int index;

    // ------------------------------------------
    //          Pass By Reference
    // ------------------------------------------
    begin = clock();
    for (index = 0; index < LOOPS; index++) {

        PassByReference(&result,&a,&b);
        a.m0 += index;
        b.m0 += index;

    }
    end = clock();
    printf("Pass By Ref: %f\n",(double)(end - begin) / CLOCKS_PER_SEC);

    // ------------------------------------------
    //            Pass By Value
    // ------------------------------------------
    begin = clock();
    for (index = 0; index < LOOPS; index++) {

        result = PassByValue(a,b);
        a.m0 += index;
        b.m0 += index;

    }
    end = clock();
    printf("Pass By Val: %f\n",(double)(end - begin) / CLOCKS_PER_SEC);


    // The following line along with the above
    // additions in the loops hopefully prevent
    // the matrices from being optimized into
    // nothing.
    printf("%0.1f\n",result.m0);

    return 0;
}

Results: 结果:

Pass By Ref: 0.489226
Pass By Val: 0.488882

From Effective C++: 从有效的C ++:

Prefer pass-by-reference-to-const over pass-by-value, it's typically more efficient and it avoids the slicing problem. 优先使用按引用传递给const而不是按值传递,它通常更有效,并且可以避免切片问题。 The rule doesn't apply to built-in types and STL iterator and function object types. 该规则不适用于内置类型以及STL迭代器和函数对象类型。 For them, pass-by-value is usually appropriate. 对于他们来说,按值传递通常是合适的。

I understand that you are programming in C instead of C++, but I think this rule still applies. 我了解您使用C而不是C ++进行编程,但是我认为该规则仍然适用。 The reason why your example of these two performs very close could be that the struct contains only float and is inexpensive to copy as it is passed by value. 您的这两个示例表现非常接近的原因可能是该结构仅包含浮点数,并且在按值传递时对其进行复制的开销很小。

However, like the author of Effective C++ said 但是,就像有效C ++的作者所说的那样

some compilers refuse to put objects consisting of only a double into a register, even though they happily place naked doubles there on a regular basis. 一些编译器拒绝将仅由双精度组成的对象放入寄存器,即使他们很乐意定期将裸双精度对象放置在寄存器中。 When that kind of thing happens, you can be better of passing such objects by reference, because compliers will certainly put pointers into registers".Unsubscribe-lgm-thur 当发生这种情况时,最好通过引用传递此类对象,因为编译器肯定会将指针放入寄存器中。” Unsubscribe-lgm-thur

In your case, maybe machine doesn't mind to put the struct in the register, but it's hard to tell when you run your program on other machines. 在您的情况下,也许机器不介意将结构放入寄存器中,但是很难确定何时在其他机器上运行程序。 Since their performance are really close, I would vote for passing by reference. 由于它们的性能非常接近,因此我建议通过引用。

you have 2 competing interests here: 您在这里有2个竞争利益:

  1. passing a struct by value, this gets typed as data storage class and pushed onto the stack by the x86 calling convention, this is a little bit slower than a by ref call that will get stuck in a register. 传递一个按值传递的struct,将其键入为数据存储类并通过x86调用约定推入堆栈,这比被卡在寄存器中的by ref调用要慢一些。

  2. this is almost exactly balanced by a bunch of pointer dereferences... 这几乎完全由一堆指针取消引用来平衡...

seperate and profile each part separately 分开并分别剖析每个零件

if you are trying to make this kind of code faster you may be able to write faster implementations in some sort of SIMD code, AltiVec, SSE or OpenCL depending 如果您试图使此类代码更快,则可能能够以某种SIMD代码,AltiVec,SSE或OpenCL编写更快的实现,具体取决于

32 float values won't fit into the registers anyway. 无论如何,32个浮点值将不适合寄存器。 The compiler will be forced to push the data from memory onto the stack, which is just another part of the memory. 编译器将被迫将数据从内存推入堆栈,这只是内存的另一部分。 Depending on the number of data accesses it may be even slower to copy the data instead of dereferencing pointers. 根据数据访问的数量,复制数据而不是取消引用指针的速度可能会更慢。

I would suggest using pass-by-reference with const modifier for any non-scalar data. 我建议对任何非标量数据使用带有const修饰符的按引用传递。 It's the job of the compiler to optimize your code for specific platforms. 编译器的工作是针对特定平台优化代码。

Technically, we only have 'pass-by-value' in C. You should pass matrix pointers (by value) to the function. 从技术上讲,我们在C语言中只有“按值传递”。您应该将矩阵指针(按值)传递给函数。 It will reduce the data 'copied' into the function and hence more efficient. 它将减少“复制”到函数中的数据,从而提高效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM