简体   繁体   English

C:返回空白而不是从子函数返回double *

[英]C: Returning a void versus returning a double * from a subfunction

I'm working on trying to speed up some general data processing in C. I've written several subroutines of the form: 我正在努力尝试加速C中的一些通用数据处理。我已经编写了几种形式的子程序:

double *do_something(double *arr_in, ...) {
   double *arr_out; 
   arr_out = malloc(...)

   for (...) {
     do the something on arr_in and put into arr_out
   }

   return arr_out; 
}

I like this style because it's easy to read and use, but often I call it as: 我喜欢这种风格,因为它易于阅读和使用,但我经常把它称为:

 array = do_something(array,...);

Would it make for faster code (and maybe prevent memory leaks) if I were to instead use void subfunctions as: 如果我改为使用void子函数,它是否会产生更快的代码(并且可能防止内存泄漏):

void do_something(double *arr_in, ...) {
   for (...) {
      arr_in = do that something;
   }
   return;
}

update 1: I ran valgrind --leak-check=full on the executable and it appears there were no memory leaks using the first method. 更新1:我在可执行文件上运行了valgrind --leak-check = full,并且看起来使用第一种方法没有内存泄漏。 However, the executable links to a library which contains all the subroutines I made with this form, so it might not catch leaks from the library. 但是,可执行文件链接到一个库,该库包含我使用此表单创建的所有子例程,因此它可能无法捕获库中的泄漏。

I'm curious as to how I would write the wrappers to free the memory and what the ** really does, or what a pointer to a pointer is, so I'm avoiding using the ** route (that and maybe I did it wrong because it didn't compile on my mac). 我很好奇我将如何编写包装来释放内存以及**真正做什么,或指针指针是什么,所以我避免使用**路径(也许我做了它错了,因为它没有在我的mac上编译)。

Here's one current subroutine: 这是一个当前子例程:

double *cos_taper(double *arr_in, int npts)
{
int i;
double *arr_out;
double cos_taper[npts];
int M; 
M = floor( ((npts - 2) / 10) + .5);

arr_out = malloc(npts*sizeof(arr_out));

for (i=0; i<npts; i++) {
    if (i<M) {
        cos_taper[i] = .5 * (1-cos(i * PI / (M + 1)));
    }
    else if (i<npts - M - 2) {
        cos_taper[i] = 1;
    }
    else if (i<npts) {
        cos_taper[i] = .5 * (1-cos((npts - i - 1) * PI / (M + 1)));
    }
    arr_out[i] = arr_in[i] * cos_taper[i];
}
return arr_out;
}

From the advice I've gotten here, it sounds like a better method would be: 根据我在这里得到的建议,听起来更好的方法是:

void *cos_taper(double *arr_in, double *arr_out, int npts)
{
int i;
double cos_taper[npts];
int M; 
M = floor( ((npts - 2) / 10) + .5);

for (i=0; i<npts; i++) {
    if (i<M) {
        cos_taper[i] = .5 * (1-cos(i * PI / (M + 1)));
    }
    else if (i<npts - M - 2) {
        cos_taper[i] = 1;
    }
    else if (i<npts) {
        cos_taper[i] = .5 * (1-cos((npts - i - 1) * PI / (M + 1)));
    }
    arr_out[i] = arr_in[i] * cos_taper[i];
}
return
}

call: 呼叫:

int main() {
  int npts;
  double *data, *cos_tapered;

  data = malloc(sizeof(data) * npts);
  cos_tapered = malloc(sizeof(cos_tapered) * npts);

//fill data

  cos_taper(data, cos_tapered, npts);
  free(data);
  ...
  free(cos_tapered);
  ...
  return 0;
}

The malloc can be expensive relative to the processing you are doing, depending on what it is. 相对于您正在进行的处理,malloc可能很昂贵,具体取决于它是什么。 Rather than restrict yourself to in-place processing, just use two parameters, in and out, and leave allocation to the caller. 不要将自己局限于就地处理,只需使用两个参数,in和out,并将分配留给调用者。 This gives the caller the option to reuse memory without allocating a new array for each call. 这为调用者提供了重用内存的选项,而无需为每个调用分配新数组。

The first invocation can easily leak memory if there is no other pointer to the original memory allocation - as you are probably aware since you are asking. 如果没有其他指向原始内存分配的指针,第一次调用很容易泄漏内存 - 因为您可能已经知道,因为您在询问。

Yes, if you can sensibly write the second version of the called function without memory allocation, it will likely be faster, because memory allocation takes time. 是的,如果您可以在没有内存分配的情况下明智地编写被调用函数的第二个版本,它可能会更快,因为内存分配需要时间。 If you just modify the called function so it has pre-allocated input and output arrays, it might just transfer the memory allocation cost to the calling function. 如果你只是修改被调用的函数,因此它有预先分配的输入和输出数组,它可能只是将内存分配成本转移到调用函数。

But disciplined use of the first version is fine; 但是对第一版的纪律使用很好; the function allocates space, and as long as you keep track of both the original space passed in and the new space passed back and are able to release both, there is no problem. 函数分配空间,只要你跟踪传入的原始空间和传回的新空间并且能够释放两者,就没有问题。

You can run yourself into the 'same' problem with: 您可以通过以下方式解决“相同”问题:

xyz = realloc(xyz, newsize);

If xyz is the only pointer to the allocated memory, that leaks memory on an allocation failure because you've just clobbered xyz with a null pointer. 如果xyz是唯一指向已分配内存的指针,那么在分配失败时会泄漏内存,因为您只是使用空指针来破坏xyz。 If there's another pointer that you will use to release the original space, this idiom does not matter - but be cautious with it. 如果有另一个指针用于释放原始空间,这个成语无关紧要 - 但要小心谨慎。


I've not fully digested the additional information in the question since writing the original version of this answer. 自从写完这个答案的原始版本以来,我还没有完全消化问题中的其他信息。

If you can do your operation in place, doing so will probably help prevent bugs (at least memory related ones) and will be faster by at least the time taken to do the malloc() operation. 如果您可以在适当的位置进行操作,那么这样做可能有助于防止错误(至少与内存相关)并且至少会在执行malloc()操作所需的时间内更快。 The actual return type of your function probably doesn't affect the speed in any way. 函数的实际返回类型可能不会以任何方式影响速度。

The returning of the double itself doesn't cost you much in terms of execution time. 双重本身的返回在执行时间方面不会花费太多。

Much more significant is the allocation of memory each time you come into the function. 更重要的是每次进入函数时都会分配内存。 If you can pre-allocate, or store the result in place as you suggested, that should greatly improve the speed. 如果您可以按照建议预先分配或存储结果,那么应该可以大大提高速度。

Another thing to consider is whether you actually need all of the precision that a double provides (vs. a float type). 另一件需要考虑的事情是你是否真的需要double提供的所有精度(相对于float类型)。 In many cases, floats are much faster. 在许多情况下,花车要快得多。

I'd opt for letting the caller allocate the memory if they want to, but also be able to choose to have the operation done in place, or to have you do the allocation. 如果他们愿意,我会选择让调用者分配内存,但也可以选择让操作完成,或者让你进行分配。

For operations that can't be done in place, you can manually check if the caller has given you the same input and output locations, and make a copy of the input yourself. 对于无法完成的操作,您可以手动检查调用者是否为您提供了相同的输入和输出位置,并自行复制输入。 Then process using that copy as input. 然后使用该副本作为输入进行处理。 This makes it look in place to the function caller. 这使它看起来适合函数调用者。

For example, suppose you want to create a function that takes an shuffles an array of indexes such that output[i] == input[ input[i] ] (a silly function, true, but one that's nontrivial to do in place). 例如,假设您要创建一个函数,该函数将一个索引数组进行混洗,以便output[i] == input[ input[i] ] (一个愚蠢的函数,为真,但这是一个非常重要的事情)。

#include <stdlib.h> 
#include <string.h>
int shuffle(size_t const * input, size_t const size, size_t ** p_output)
{
    int retval = 0;
    size_t i;
    char in_place = 0;
    char cleanup_output = 0;

    if (size == 0)
    {
        return 0; // nothing to do
    }
    // make sure we can read our input and write our output
    else if (input == NULL || p_output == NULL)
    {
        return -2; // illegal input
    }
    // allocate memory for the output
    else if (*p_output == NULL)
    {
        *p_output = malloc(size * sizeof(size_t));
        if (*p_output == NULL) return -1; // memory allocation problem
        cleanup_output = 1; // free this memory if we run into errors
    }
    // use a copy of our input, since the algorithm doesn't operate in place.
    // and the input and output overlap at least partially
    else if (*p_output - size < input && input < *p_output + size)
    {
        size_t * const input_copy = malloc(size * sizeof(size_t));
        if (input_copy == NULL) return -1; // memory allocation problem
        memcpy( input_copy, input, size * sizeof(size_t));
        input = input_copy;
        in_place = 1;
    }

    // shuffle
    for (i = 0; i < size; i++)
    {
        if (input[i] >= size)
        {
            retval = -2; // illegal input
            break;
        }
        (*p_output)[i] = input[ input[i] ];
    }

    // cleanup
    if (in_place)
    {
         free((size_t *) input);
    }
    if (retval != 0 && cleanup_output)
    {
         free(*p_output);
         *p_output = NULL;
    }

    return retval;
}

This makes your function more robust - the function caller can allocate memory for the output (if they want to keep the input around), or have the output appear in the same place as the input, or have you allocate the memory for the output. 这使您的函数更加健壮 - 函数调用者可以为输出分配内存(如果他们想要保持输入),或者让输出与输入位于同一位置,或者为输出分配内存。 This is especially nice if they got the input and output locations from somewhere else themselves, and aren't sure whether they're distinct. 如果他们从其他地方获得输入和输出位置,并且不确定它们是否是不同的,那么这尤其好。 They don't have to know anything about the workings of the function. 他们不必了解该功能的工作原理。

// caller allocated memory
my_allocated_mem = malloc( my_array_size * sizeof(size_t) );
if(my_allocated_mem == NULL) { /*... */ }
shuffle( my_array, my_array_size, &my_allocated_mem );

// function allocated memory
my_allocated_mem = NULL;
shuffle( my_array, my_array_size, &my_allocated_mem );

// in place calculation
shuffle( my_array, my_array_size, &my_array);

// (naughty user isn't checking the function for error values, but you get the idea...)

You can see a full example of use here . 您可以在此处查看完整的使用示例。

Since C doesn't have exceptions, it's fairly standard to use the return value of a function to report errors, and pass calculated values back via function pointer. 由于C没有异常,因此使用函数的返回值来报告错误并通过函数指针返回计算值是相当标准的。

I just ran your code (after fixing a number of small errors). 我刚刚运行了你的代码(在修复了一些小错误之后)。 Then I took several stackshots . 然后我拍了好几张照片 The people who said malloc would be your culprit were right. malloc会成为你罪魁祸首的人是对的。 Nearly all of your time is spent in there. 几乎所有的时间都花在了那里。 Compared to that, the rest of your code is not very significant. 与此相比,其余代码并不是很重要。 Here's the code: 这是代码:

#include <math.h>
#include <stdlib.h>
const double PI = 3.1415926535897932384626433832795;

void cos_taper(double *arr_in, double *arr_out, int npts){ 
    int i; 
//  double taper[npts];
    double* taper = (double*)malloc(sizeof(double) * npts); 
    int M;  
    M = (int)floor( ((npts - 2) / 10) + .5); 

    for (i=0; i<npts; i++){ 
        if (i<M) { 
            taper[i] = .5 * (1-cos(i * PI / (M + 1))); 
        } 
        else if (i<npts - M - 2) { 
            taper[i] = 1; 
        } 
        else if (i<npts) { 
            taper[i] = .5 * (1-cos((npts - i - 1) * PI / (M + 1))); 
        } 
        arr_out[i] = arr_in[i] * taper[i]; 
    }
    free(taper);
    return;
}

void doit(){
    int i;
    int npts = 100; 
    double *data, *cos_tapered; 

    data = (double*)malloc(sizeof(double) * npts); 
    cos_tapered = (double*)malloc(sizeof(double) * npts); 

    //fill data 
    for (i = 0; i < npts; i++) data[i] = 1;

    cos_taper(data, cos_tapered, npts); 
    free(data); 
    free(cos_tapered); 
}

int main(int argc, char* argv[]){
    int i;
    for (i = 0; i < 1000000; i++){
        doit();
    }
    return 0;
}

EDIT: I timed the above code, which took 22us on my machine (mostly in malloc ). 编辑:我计时上面的代码,我的机器上花了22us(主要是在malloc )。 Then I modified it to do the mallocs only once on the outside. 然后我修改它只在外面做一次mallocs。 That dropped the time to 5.0us, which was mostly in the cos function. 这使得时间减少到5.0us,这主要是在cos函数中。 Then I switched from Debug to Release build, which dropped the time to 3.7us (now even more in the cos function, obviously). 然后我从Debug转换为Release版本,将时间减少到3.7us(现在cos函数中显然更多)。 So if you really want to make it fast, I recommend stackshots to find out what you're mostly doing, and then see if you can avoid doing it. 因此,如果你真的想要快速,我建议使用stackshots找出你最常做的事情,然后看看你是否可以避免这样做。

In your function 在你的功能

void do_something(double *arr_in, ...) {
   for (...) {
      arr_in = do_that_something;
   }
}

That would be incorrect as you have no parameter-by-reference to pass back out the array once the do_something function goes out of scope..it should look something like this 这是不正确的,因为一旦do_something函数超出范围,你就没有逐个引用的参数来传回数组。它应该看起来像这样

void do_something(double **arr_in, ...) {
   for (...) {
      *arr_in = do_that_something;
   }
}
/*
** Would be called like this:
** do_something(&array, ...);
*/

Stick to the first example as it is easier to read. 坚持第一个例子,因为它更容易阅读。 You need to add error checking in the first example if the call to malloc failed and continue processing with a NULL pointer... 如果对malloc的调用失败并继续使用NULL指针处理,则需要在第一个示例中添加错误检查...

Hope this helps, Best regards, Tom. 希望这会有所帮助,最好的问候,汤姆。

You would be saving a small amount of time by not having the malloc but this may add up quickly and make a noticeable difference if you call do_something in a tight loop. 你可以通过不使用malloc来节省少量时间,但如果你在紧密的循环中调用do_something,这可能会很快加起来并产生显着的差异。 You would also save a small amount of time by not having to return the double * but again, this can add up if do_something is call frequently. 您还可以通过不必返回双倍*来节省少量时间,但如果经常调用do_something,则可以加起来。

As for the processing itself, there would be no difference since both case are operating on a double * 至于处理本身,没有区别,因为两种情况都是双重操作*

Since you are not using dynamic memory allocation in your proposed method there is no longer a possibility of memory leaks. 由于您未在所提出的方法中使用动态内存分配,因此不再存在内存泄漏的可能性。

You also have the option to pass a second parameter as your out parameter. 您还可以选择将第二个参数作为out参数传递。 For instance 例如

int do_something (double * in , double * out) {
   /*
    * Do stuff here
    */
   if (out is modified)
      return 1;
   return 0;
}

Or similar without the return. 或类似的没有回报。

I would suggest that if you allocate memory within a sub-function, that you either create a corresponding wrapper to clean-up, free the memory allocated, or make it blindingly obvious that the function is allocating memory, to prevent forgetting to free the memory. 我建议如果你在一个子函数中分配内存,你要么创建一个相应的包装器进行清理,释放分配的内存,要么让它明显地表明该函数正在分配内存,以防止忘记释放内存。

In regards to memory footprint, the second approach would use less memory, but it only works if the functions don't modify the size of the initial array. 关于内存占用,第二种方法将使用更少的内存,但只有在函数不修改初始数组的大小时才有效。 Depending on the usage this is not always true. 根据用途,这并不总是正确的。

In regards to speed, the second approach should be theoretically faster, because one less pointer is pushed onto the stack at the end of the function call ( do_something ), but a single pointer is minimal difference unless there is heavy repeated usage, in which case carefully considering inlining should already be an consideration. 关于速度,第二种方法理论上应该更快,因为在函数调用结束时( do_something )将一个较少的指针压入堆栈,但是单个指针的差异最小,除非重复使用很多,在这种情况下仔细考虑内联应该已经是一个考虑因素。 So unless you have actually measured the function call's overhead as an issue (by profiling ), I wouldn't bother with such micro-optimizations without a good reason (memory footprint or profiling). 因此,除非您实际测量函数调用的开销是一个问题(通过分析 ),否则在没有充分理由(内存占用或分析)的情况下,我不会理会这样的微优化。

The type of the function determines the interface between the function and the places in the code that calls it, which is to say that there is likely to be important code design issues involved in the choice. 函数的类型决定了函数与调用它的代码中的位置之间的接口,也就是说选择中可能存在重要的代码设计问题。 As a rule, these are more worth thinking about than speed (provided the speed issue isn't one of memory leaks so large that the application suffers DOS through thrashing...) 通常,这些比速度更值得考虑(如果速度问题不是内存泄漏那么大,以至于应用程序通过颠簸遭受DOS ...)

The second type pretty much indicates the inetent to mutate the array. 第二种类型几乎表明了改变数组的能力。 The first is ambiguous: maybe you will always mutate the array, maybe you will always provide a freshly allocate result, maybe your code sometimes does one and sometimes does another. 第一个是模棱两可的:也许你总是会改变数组,也许你总会提供一个新的分配结果,也许你的代码有时会做一个,有时会做另一个。 The freedom comes with a minefield of difficulties making sure the code is correct. 自由带来了一个困难的雷区,确保代码是正确的。 If you go this route, the effort putting a liberal sprinkling of assert() s through your code, to assert invariants about the freshness and sharedness of pointers, will likely pay for itself with ample interest when debugging. 如果你走这条路,那么通过你的代码自由地使用assert() s来断言关于指针的新鲜度和共享性的不变量,在调试时可能会有很大的兴趣。

Well, you started your question talking of speed and I don't believe this subject was really answered. 嗯,你开始谈论速度的问题,我不相信这个问题真的得到了解答。 First thing to say is that working on parameter passing seems not to be the better way to speed up things... 首先要说的是,参与传递似乎不是加速事情的更好方法......

I agree with other answers : the first proposal using malloc is an highway to memory leaks (and is probably slower anyway), the other proposal you came up to is much better. 我同意其他答案:使用malloc的第一个提议是内存泄漏的高速公路(并且反正可能更慢),你提出的另一个提议要好得多。 Following ergosys suggestions in comment you can easily enhance it and get good C code. 根据ergosys评论中的建议,您可以轻松地增强它并获得良好的C代码。

Now with a few math you can still get better. 现在有了一些数学,你仍然可以变得更好。

First, no need to use double and floor call to compute integers. 首先,不需要使用double和floor调用来计算整数。 You get the same M without floor nor adding 0.5 just writing M = (nbelts-2) / 10; 你得到相同的M没有地板,也没有加0.5只写M =(nbelts-2)/ 10; (Hint: integer division truncate to integer). (提示:整数除法截断为整数)。

If you also notice that you always have M < nbelt - M - 2 < nbelt (Well, you certainly allready know it) you can avoid testing limits inside loops by splitting the loop in three independent parts. 如果你还注意到你总是有M <nbelt - M - 2 <nbelt(嗯,你当然已经知道了),你可以通过将循环分成三个独立的部分来避免测试循环内的限制。 And this can still be optimized in the case where in array is the same as out array. 在数组中与out数组相同的情况下,仍然可以优化这一点。

Your function could become something like this: 你的功能可能会变成这样:

void cos_taper(double *arr_in, double *arr_out, int npts)
{
int i;
int M; 
M = (npts - 2) / 10;

if (arr_out == arr_in) {
    for (i=0; i<M; i++) {
        arr_out[i] *= .5 * (1-cos(i * PI / (M + 1)));
    }
    for (i = npts - M - 2; i<npts; i++) {
        arr_out[i] *= .5 * (1-cos((npts - i - 1) * PI / (M + 1)));
    }
}
else {
    for (i=0; i<M; i++) {
        arr_out[i] = arr_in[i] * (.5 * (1-cos(i * PI / (M + 1))));
    }
    for (; i<npts - M - 2; i++) {
        arr_out[i] = arr_in[i];
    }
    for (; i<npts; i++) {
        arr_out[i] = arr_in[i] * (.5 * (1-cos((npts - i - 1) * PI / (M + 1))));
    }
}
}

That's positively not the end of it and with some more thought more optimizations are possible, for instance expressions like (.5 * (1-cos(i * PI / (M + 1)))); 这肯定不是它的结束,并且更多的想法可能有更多的优化,例如像(.5 * (1-cos(i * PI / (M + 1))));这样的表达式(.5 * (1-cos(i * PI / (M + 1)))); looks like they could get a relatively small number of values (depends of size of nbelt as it's a function of i and nbelt, number of different results is a square law, but cos should reduce that number again as it's periodic). 看起来他们可以获得相对较少的数值(取决于nbelt的大小,因为它是i和nbelt的函数,不同结果的数量是平方律,但cos应该再次减少该数字,因为它是周期性的)。 But all depends of what level of performance you need. 但一切都取决于您需要的性能水平。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM