简体   繁体   English

返回与不返回功能?

[英]Return vs. Not Return of functions?

Return or not return, it's a question for functions! 返回或不返回,这是一个功能问题! Or, does it really matter? 或者,它真的重要吗?


Here goes the story : I used to write code like the following: 这就是故事 :我曾经写过如下代码:

Type3 myFunc(Type1 input1, Type2 input2){}

But recently my project colleges told me that I should try, as mush as possible, to avoid writing function like this, and suggest the following way by putting the returned value in the input parameters. 但是最近我的项目学院告诉我,我应该尽可能地尝试避免编写这样的函数,并通过将返回的值放在输入参数中来建议以下方法。

void myFunc(Type1 input1, Type2 input2, Type3 &output){}

They convinced me that this is better and faster because of the extra copying step when returning in the first method. 他们让我确信这是更好更快的,因为在第一种方法中返回时需要额外的复制步骤。


For me, I start to believe that the second method is better in some situations, especially I have multiple things to return or modify. 对我来说,我开始相信第二种方法在某些情况下更好,特别是我有多个要返回或修改的东西。 For example: the second line of following will be better and faster than the first one as avoiding copying the whole vecor<int> when returning. 例如:以下第二行将比第一行更好更快,因为避免在返回时复制整个vecor<int>

vector<int> addTwoVectors(vector<int> a, vector<int> b){}
void addTwoVectors(vector<int> a, vector<int> b, vector<int> &result){}:

But, in some other situations, I cannot buy it. 但是,在其他一些情况下,我不能买它。 For example, 例如,

bool checkInArray(int value, vector<int> arr){}

will be definitely better than 绝对会比

void checkInArray(int value, vector<int> arr, bool &inOrNot){}

In this case, I think the first method by directly return the result is better in terms of better readability. 在这种情况下,我认为通过直接返回结果的第一种方法在更好的可读性方面更好。


In summary, I am confused about (emphasis on C++): 总之,我很困惑(强调C ++):

  • What should be returned by functions and what should not (or try to avoid)? 什么应该由函数返回,什么不应该(或尽量避免)?
  • Is there any standard way or good suggestions for me to follow? 我有什么标准的方法或好的建议吗?
  • Can we do better in both in readability and in code efficiency? 我们可以在可读性和代码效率方面做得更好吗?

Edit : I am aware of that, under some conditions, we have to use one of them. 编辑 :我知道,在某些情况下,我们必须使用其中之一。 For example, I have to use return-type functions if I need to achieve method chaining . 例如,如果我需要实现method chaining ,我必须使用return-type functions So please focus on the situations where both methods can be applied to achieve the goal. 因此,请关注可以应用这两种方法来实现目标的情况。

I know this question may not have a single answer or sure-thing. 我知道这个问题可能没有一个答案或肯定的事情。 Also it seems this decision need to be made in many coding languages, like C , C++ , etc. Thus any opinion or suggestion is much appreciated (better with examples). 此外,似乎需要在许多编码语言中做出这样的决定,例如CC++等。因此,任何意见或建议都非常受欢迎(更好的例子)。

As always when someone brings the argument that one thing is faster than the other, did you take timings? 像往常一样,当有人提出一件事比另一件事快时,你是否采取了时间安排? In fully optimized code, in every language and every compiler you plan to use? 在完全优化的代码中,您计划使用的每种语言和每个编译器? Without that, any argument based on performance is moot. 没有它,任何基于性能的论证都没有实际意义。

I'll come back to the performance question in a second, just let me address what I think is more important first: There are good reasons to pass function parameters by reference, of course. 我将在一秒钟内回到性能问题,让我先解决一下我认为更重要的问题:当然,有充分的理由通过引用传递函数参数。 The primary one I can think of right now is that the parameter is actually input and output, ie, the function is supposed to operate on the existing data. 我现在能想到的主要问题是参数实际上是输入和输出,即该函数应该对现有数据进行操作。 To me, that is what a function signature taking a non-const reference indicates. 对我来说,这就是采用非const引用的函数签名所表明的。 If such a function then ignores what is already in that object (or, even worse, clearly expects to only ever get a default-constructed one), that interface is confusing. 如果这样的函数然后忽略了该对象中已经存在的东西(或者更糟糕的是,显然希望只得到一个默认构造的那个),那么该接口就会让人困惑。

Now, to come back to performance. 现在,回到表演。 I cannot speak for C# or Java (though I believe returning an object in Java would not cause a copy in the first place, just passing around a reference), and in C, you do not have references but might need to resort to passing pointers around (and then, I do agree that passing in a pointer to uninitialized memory is ok). 我不能代表C#或Java(虽然我相信在Java中返回一个对象不会首先导致副本,只是传递一个引用),而在C中,你没有引用但可能需要求助于传递指针周围(然后,我同意传入指向未初始化内存的指针是可以的)。 But in C++, compilers have for a long time done return value optimization, RVO, which basically just means that in most calls like A a = f(b); 但是在C ++中,编译器已经做了很长时间的返回值优化,RVO,这基本上只意味着在大多数调用中,如A a = f(b); , the copy constructor is bypassed and f will create the object directly in the right place. ,副本构造函数被绕过, f将直接在正确的位置创建对象。 In C++11, we even got move semantics to make this explicit and use it in more places. 在C ++ 11中,我们甚至使用移动语义来使其显式化并在更多地方使用它。

Should you just return an A* instead? 你应该只返回一个A*吗? Only if you really long for the old days of manual memory management. 只有你真的渴望过去的手动内存管理。 At the very least, return an std::shared_ptr<A> or an std::unique_ptr<A> . 至少,返回一个std::shared_ptr<A>或一个std::unique_ptr<A>

Now, with multiple outputs, you get additional complications, of course. 现在,有了多个输出,当然你会得到额外的复杂功能。 The first thing to do is if your design is actually proper: Each function should have a single responsibility, and usually, that means returning a single value as well. 首先要做的是你的设计是否合适:每个函数都应该有一个责任,通常,这意味着返回一个值。 But there are of course exceptions to this; 但当然有例外; eg, a partitioning function will have to return two or more containers. 例如,分区功能必须返回两个或多个容器。 In that situation, you may find that the code is easier to read with non-const reference arguments; 在这种情况下,您可能会发现使用非const引用参数更容易阅读代码; or, you may find that returning a tuple is the way to go. 或者,你可能会发现返回一个元组是要走的路。

I urge you to write your code both ways, and come back the next day or after a weekend and look at the two versions again. 我恳请你们两种方式编写代码,然后在第二天或周末之后回来看看这两个版本。 Then, decide what is easier to read. 然后,决定什么更容易阅读。 In the end, that is the primary criterion for good code. 最后,这是良好代码的主要标准。 For those few places where you can see a performance difference from an end-user workflow, that is an additional factor to consider, but only in very rare cases should it ever take precedence over readable code – and with a little more effort, you can usually get both to work anyway. 对于那些您可以从最终用户工作流程中看到性能差异的地方,这是一个需要考虑的额外因素,但只有在非常罕见的情况下才应该优先于可读代码 - 并且只需要更多的努力,您就可以无论如何通常都要工作。

Due to Return Value Optimization, the second form (passing a reference and modifying it) is almost certainly slower and less amendable to optimization, as well as less legible. 由于返回值优化,第二种形式(传递引用并对其进行修改)几乎肯定更慢,更难以修改,也不太容易辨认。

Let us consider a simple example function: 让我们考虑一个简单的示例函数:

return_value foo( void );

Here are the possibilities that may occur: 以下是可能发生的可能性:

  1. Return Value Optimization (RVO) 返回值优化(RVO)
  2. Named Return Value Optimization (NRVO) 命名返回值优化(NRVO)
  3. Move semantic return 移动语义返回
  4. Copy semantic return 复制语义返回

What is Return Value Optimization ? 什么是回报值优化 Consider this function: 考虑这个功能:

return_value foo( void ) { return return_value(); }

In this example, an unnamed temporary variable is returned from a single exit point. 在此示例中,从单个出口点返回未命名的临时变量。 Because of this, the compiler can easily (and is free to) completely remove any traces of this temporary value, and instead construct it directly in place, in the calling function: 因此,编译器可以轻松(并且可以自由地)完全删除此临时值的任何痕迹,而是在调用函数中直接构造它:

void call_foo( void )
{
    return_value tmp = foo();
}

In this example, tmp is actually directly used in foo as if foo defined it, removing all copies. 在这个例子中,tmp实际上直接在foo中使用,就像foo定义它一样,删除所有副本。 This is a HUGE optimization if return_value is a non-trivial type. 如果return_value是非平凡类型,那么这是一个巨大的优化。

When can RVO be used? 什么时候可以使用RVO? That's up to the compiler, but in general, with a single return code point, it will always be used. 这取决于编译器,但通常,使用单个返回代码点,它将始终使用。 Multiple return code points make it more iffy, but if they are all anonymous, your chances increase. 多个返回代码点使它更加不确定,但如果它们都是匿名的,那么你的机会就会增加。

What about Named Return Value Optimization? 命名返回值优化怎么样?

This one is a bit trickier; 这个有点棘手; if you name the variable before you return it, it's now an l-value. 如果在返回变量之前命名变量,它现在是一个l值。 This means the compiler has to do more work to prove that the in place construction will be possible: 这意味着编译器必须做更多的工作来证明就地构造是可能的:

return_type foo( void )
{
    return_type bar;
    // do stuff
    return bar;
}

In general, this optimization is still possible, but less likely with multiple code paths, unless each code path returns the same object; 通常,这种优化仍然是可能的,但是对于多个代码路径的可能性较小,除非每个代码路径返回相同的对象; returning multiple different objects from multiple different code paths tends to not difficult to optimize out: 从多个不同的代码路径返回多个不同的对象往往不难以优化:

return_type foo( void)
{
    if(some_condition)
    {
        return_type bar = value;
        return bar;
    }
    else
    {
        return_type bar2 = val2;
        return bar2;
    }
}

This is not going to be as well received. 这不会得到好评。 It's still possible NRVO could kick in, but it's getting less and less likely. NRVO仍有可能启动,但它的可能性越来越小。 If at all possible, construct a single return_value and tweak it in different code paths, rather than returning wholly different ones. 如果可能的话,构造一个return_value并在不同的代码路径中调整它,而不是返回完全不同的代码路径。

If NRVO is possible, this will get rid of any overhead; 如果NRVO是可能的,这将消除任何开销; it will be as if it was constructed directly in the calling function. 就好像它是直接在调用函数中构造的一样。

If neither form of return value optimization is possible, Move return may be possible. 如果两种形式的返回值都不可能,则可以进行Move返回

C++11 and C++03 both have the possibility to do move semantics; C ++ 11和C ++ 03都有可能进行移动语义; rather than copying the information out of one object into another, move semantics allow one object to steal the data in another, setting it to some default state. 而不是将信息从一个对象复制到另一个对象,移动语义允许一个对象窃取另一个对象的数据,将其设置为某个默认状态。 For C++03 move semantics, you need boost.move, but the concept is still sound. 对于C ++ 03移动语义,你需要boost.move,但这个概念仍然是合理的。

Move return isn't as fast as RVO return, but it's drastically faster than a copy. 移动返回没有RVO返回的那么快,但它比副本快得多。 For a compliant C++11 compiler, of which there are many today, all STL and STD structures should support move semantics. 对于兼容的C ++ 11编译器,今天有很多,所有STL和STD结构都应该支持移动语义。 Your own objects may not have a default move constructor/assignment operator (MSVC do not currently have default move semantic operations for user defined types), but adding move semantics is not hard: just use the copy-and-swap idiom to add it! 您自己的对象可能没有默认的移动构造函数/赋值运算符(MSVC当前没有用户定义类型的默认移动语义操作),但添加移动语义并不难:只需使用复制和交换习惯用法来添加它!

What is the copy-and-swap idiom? 什么是复制和交换习语?

Finally, if your return_value does not support move and your function is too hard to RVO, you will default to copy semantics, which is what your friend said to avoid. 最后,如果你的return_value不支持move并且你的函数对于RVO来说太难了, 你将默认复制语义,这是你朋友说要避免的。

However, in a large amount of cases, this will not be significantly slower! 但是,在大量情况下,这不会明显变慢!

For primitive types, such as float or int or bool, copying is a single assignment or move; 对于原始类型,例如float或int或bool,复制是单个赋值或移动; hardly the sort of thing to complain about; 几乎没有什么可抱怨的; passing such things by reference without a really good reason is sure to make your code slower, as references are internally pointers. 通过引用传递这些东西没有一个很好的理由肯定会使你的代码变慢,因为引用是内部指针。 For something like your bool example, there's no reason to waste time or energy passing a bool by reference; 对于像你的bool例子这样的东西,没有理由浪费时间或精力通过参考bool; returning it is the fastest possible way. 返回它是最快的方式。

When you return something that fits in a register, it's usually returned in a register for exactly that reason; 当你返回一个适合寄存器的东西时,它通常会在寄存器中返回,正是出于这个原因; it's fast, and as noted, easiest to maintain. 它很快,如上所述,最容易维护。

If your type is a POD type, such as a simple struct, this can often be passed through registers via a fastcall mechanism, or optimized away into direct assignments. 如果您的类型是POD类型,例如简单的结构,则通常可以通过快速调用机制通过寄存器传递,或者优化为直接赋值。

If your type is a large and imposing type, such as std::string or something with a lot of data behind it, requiring lots of deep copies, and your code is sufficiently complex as to make RVO unlikely, then perhaps passing by reference is a better idea. 如果你的类型是一个庞大而强大的类型,例如std :: string或其后面有大量数据的东西,需要大量的深拷贝,并且你的代码足够复杂以至于不太可能使RVO,那么可能通过引用传递一个更好的主意。

Summary 摘要

  1. Anonymous (rvalue) values of any kind should be returned by value 应按值返回任何类型的匿名(rvalue)值
  2. Small or primitive types should be returned by value. 应按值返回小型或原始类型。
  3. Any type supporting move semantics (the STL, STD, etc) should be returned by value 任何支持移动语义的类型(STL,STD等)都应该按值返回
  4. Named (lvalue) values that are easy to reason about should be returned by value 应该通过值返回易于推理的命名(左值)值
  5. Large data types in complex functions should be profiled or passed by reference 复杂功能中的大数据类型应通过引用进行分析或传递

Always return by value when possible, if you are using C++11. 如果您使用的是C ++ 11,请尽可能按值返回。 It's more legible, and faster. 它更清晰,更快。

There's no single answer to this question, but as you already stated, the central part is: It depends. 这个问题没有一个单一的答案,但正如你已经说过的那样,核心部分是:它取决于。

Clearly, for simple types, such as ints or bools, the return value is generally the preferred solution. 显然,对于简单类型,例如int或bools,返回值通常是首选解决方案。 It is easier to write and also less error-prone (ie because you cannot pass something undefined to the function and you don't need to separately define the variable before the call instruction). 它更容易编写,也更不容易出错(因为你不能将未定义的东西传递给函数,并且你不需要在调用指令之前单独定义变量)。 For complex types, such as a collection, the call-by-reference might be preferred because it avoids, as you say, the extra copy step. 对于复杂类型(例如集合),可能首选call-by-reference,因为它可以避免额外的复制步骤。 But you could also return a vector<int>* instead of just a vector<int> , which archives the same (for the cost of some extra memory-management, though). 但是你也可以返回一个vector<int>*而不仅仅是一个vector<int> ,它会归档相同的(为了一些额外的内存管理的成本)。 All this, however, also depends on the language used. 然而,所有这些还取决于所使用的语言。 The above will mostly hold true for C or C++, but for managed classes such as Java or C#, most complex types are reference-types anyway, so returning a vector does not involve any copying there. 上述内容大多适用于C或C ++,但对于托管类(如Java或C#),大多数复杂类型无论如何都是引用类型,因此返回向量不涉及任何复制。

Of course, there are situations where you do want the copy to happen, ie if you want to return the (copy of) an internal vector in such a way that the caller cannot modify the internal data structure of the called class. 当然,在某些情况下,您确实希望复制发生,即如果您希望以调用者无法修改被调用类的内部数据结构的方式返回内部向量的(副本)。

So again: It depends. 再说一次:这取决于。

This is a distinction between methods and functions. 这是方法和功能之间的区别。

Methods (aka subroutine) are called primarily called for their side effect, which is to modify one or more of the objects passed into it as parameter. 方法(aka子程序)被称为主要调用它们的副作用,即修改作为参数传递给它的一个或多个对象。 In languages that supports OOP, the object to be modified is usually implicitly passed as this/self parameter. 在支持OOP的语言中,要修改的对象通常作为this / self参数隐式传递。

Functions, on the other hand, are called primarily for their return value, it calculates something new and shouldn't modify the parameters at all and should avoid side effects. 另一方面,函数主要被称为返回值,它计算新的东西,不应该修改参数,应该避免副作用。 Functions should be pure in the functional programming sense. 在函数编程意义上,函数应该是纯粹的。

If a function/method is meant to create a new object (ie a factory) then the object should be returned. 如果函数/方法用于创建新对象(即工厂),则应返回该对象。 If you pass in a reference to variable, then it isn't clear who will be responsible for cleaning up the object previously contained in the variable, the caller or the factory? 如果传入对变量的引用,那么不清楚谁将负责清理以前包含在变量,调用者或工厂中的对象? With factory function , it's clear that the caller is responsible for ensuring cleanup of the previous object; 使用工厂功能 ,很明显调用者负责确保清除前一个对象; with factory method , it's not so clear because the factory can do cleanup, although that's often a bad idea for various reasons. 使用工厂方法 ,它不是那么清楚,因为工厂可以进行清理,尽管由于各种原因这通常是一个坏主意。

If a function/method is meant to modify an object or objects, then the object (s) should be passed in as argument, the object(s) that have been modified shouldn't be returned (an exception to this is if you're designing for fluent interface/method chaining in a language that supports them). 如果一个函数/方法是为了修改一个或多个对象,那么对象应该作为参数传入,不应该返回已修改的对象(例如,如果你'重新设计用于支持它们的语言的流畅接口/方法链接。

If your objects are immutable, then you should always use functions because every operations on immutable objects must create new object. 如果您的对象是不可变的,那么您应该始终使用函数,因为不可变对象上的每个操作都必须创建新对象。

Adding two vectors should be a function (use return value), because the return value is a new vector. 添加两个向量应该是一个函数(使用返回值),因为返回值是一个新向量。 If you're adding another vector to an existing vector then that should be a method since you're modifying an existing vector and not allocating a new one. 如果要向现有向量添加另一个向量,那么这应该是一种方法,因为您正在修改现有向量而不是分配新向量。

In a language that doesn't support exception, return value is often used to signal error value; 在不支持异常的语言中,返回值通常用于表示错误值; however on languages that supports exception, error conditions should always be signaled with exception, and there should never be a method that return a value, or a function that modified its arguments. 但是对于支持异常的语言,错误条件应始终用异常信号通知,并且永远不应该有返回值的方法或修改其参数的函数。 In other words, don't do side effects and return a value within the same function/method. 换句话说,不要做副作用并在同一个函数/方法中返回一个值。

What should be returned by functions and what should not (or try to avoid)? 什么应该由函数返回,什么不应该(或尽量避免)? It depends on what your method is supposed to do. 这取决于你的方法应该做什么。

When your method modifies the list or returns new data you should use the return value. 当您的方法修改列表或返回新数据时,您应该使用返回值。 Its much better to understand what your code does than using a ref parameter. 理解你的代码比使用ref参数更好。

Another benefit of return values is the ability to use method chaining. 返回值的另一个好处是使用方法链的能力。

You can write code like this which passes the list parameter from one method to another: 您可以编写这样的代码,将list参数从一个方法传递到另一个方法:

method1(list).method2(list)...

As as been said, there is no general answer. 如前所述,没有一般性答案。 But no one has talked about the machine level, so I'll do that and try some examples. 但是没有人谈过机器级别,所以我会这样做并尝试一些例子。

For operands that fit in a register, the answer is obvious. 对于适合寄存器的操作数,答案是显而易见的。 Every compiler I've seen will use a register for the return value (even if it's a struct). 我见过的每个编译器都会使用寄存器来返回值(即使它是一个struct)。 This is as efficient as you'll get. 这和你一样高效。

So the remaining question is large operands. 所以剩下的问题是大型操作数。

At this point it's up to the compiler. 此时,由编译器决定。 It is true that some (especially older) compilers would emit a copy to implement return of a value larger than a register. 确实有些(特别是较旧的)编译器会发出一个副本来实现一个大于寄存器的值的返回。 But this is dark ages technology. 但这是黑暗时代的技术。

Modern compilers - primarily because RAM is much bigger these days, and that makes life much better - are not so stupid. 现代编译器 - 主要是因为RAM现在变得更大,而且生活更美好 - 并不是那么愚蠢。 When they see " return foo; " in a function body and foo does not fit in a register, they mark foo as a reference to memory. 当他们在函数体中看到“ return foo; ”并且foo不适合寄存器时,它们将foo标记为对内存的引用。 This is memory allocated by the caller to hold the return value. 这是调用者为保存返回值而分配的内存。 Consequently, the compiler ends up generating almost exactly the same code as it would if you had passed a reference to return value yourself. 因此,编译器最终生成的代码几乎与您自己传递返回值的引用完全相同

Let's verify this. 我们来验证一下。 Here's a simple program. 这是一个简单的程序。

struct Big {
  int a[10000];
};

Big process(int n, int c)
{
  Big big;
  for (int i = 0; i < 10000; i++)
    big.a[i] = n + i;
  return big;
}

void process(int n, int c, Big& big)
{
  for (int i = 0; i < 10000; i++)
    big.a[i] = n + i;
}

Now I'll compile it with the XCode compiler on my MacBook. 现在我将在MacBook上使用XCode编译器进行编译。 Here's the relevant output for the return version: 这是return版本的相关输出:

    xorl    %eax, %eax
    .align  4, 0x90
LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
    leal    (%rsi,%rax), %ecx
    movl    %ecx, (%rdi,%rax,4)
    incq    %rax
    cmpl    $10000, %eax            ## imm = 0x2710
    jne     LBB0_1
## BB#2:
    movq    %rdi, %rax
    popq    %rbp
    ret

and for the reference version: 并为参考版本:

    xorl    %eax, %eax
    .align  4, 0x90
LBB1_1:                                 ## =>This Inner Loop Header: Depth=1
    leal    (%rdi,%rax), %ecx
    movl    %ecx, (%rdx,%rax,4)
    incq    %rax
    cmpl    $10000, %eax            ## imm = 0x2710
    jne     LBB1_1
## BB#2:
    popq    %rbp
    ret

Even if you don't read assembly language code, you can see the similarity. 即使您没有阅读汇编语言代码,也可以看到相似性。 There is perhaps one instruction's difference. 也许有一条指令的区别。 This is with -O1 . 这是-O1 With optimization off, the code is longer, but still almost identical. 优化关闭后,代码更长,但仍然几乎完全相同。 With gcc version 4.2, the results are very similar. 使用gcc 4.2版,结果非常相似。

So you should tell your friends "no". 所以你应该告诉你的朋友“不”。 Using a return value with a modern compiler has no penalty. 使用带有现代编译器的返回值没有任何惩罚。

To me, the passing of a non-const pointer means two things: 对我来说,传递一个非常量指针意味着两件事:

  • The parameter may be changed in-place (you can pass a pointer to a struct member and obviate assignment); 参数可以就地更改(您可以将指针传递给struct成员并避免赋值);
  • The parameter needs not be returned if null is passed. 如果传递null则不需要返回参数。

The latter may allow to avoid a whole possibly expensive branch of code that calculates its output value because it is not desired anyway. 后者可以允许避免计算其输出值的整个可能昂贵的代码分支,因为无论如何都不需要它。

I see this as an optimization , that is, something which is done when performance impact is measured or at least estimated. 我认为这是一种优化 ,即在衡量或至少估计绩效影响时所做的事情。 Otherwise I prefer as immutable data as possible, and as pure functions as possible, to simplify correct reasoning about the program's flow. 否则,我更喜欢尽可能不可变的数据,并尽可能使用纯函数,以简化程序流程的正确推理。

Usually correctness beats performance, so I'd stay with clear separation of (const) input parameters and a return struct, unless it's obviously or provably hampers performance or code readability. 通常正确性胜过性能,所以我会保持(const)输入参数和返回结构的明确分离,除非它明显或可证明地妨碍性能或代码可读性。

(Disclaimer: I don't usually write in C.) (免责声明:我通常不用C语写。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM