简体   繁体   English

C++ 标准特性和二进制大小

[英]C++ std features and Binary size

I was told recently in a job interview their project works on building the smallest size binary for their application (runs embedded) so I would not be able to use things such as templating or smart pointers as these would increase the binary size, they generally seemed to imply using things from std would be generally a no go (not all cases).我最近在一次工作面试中被告知,他们的项目致力于为他们的应用程序构建最小大小的二进制文件(运行嵌入式),所以我无法使用模板或智能指针等东西,因为它们会增加二进制文件的大小,它们通常看起来暗示使用 std 中的东西通常是不可行的(并非所有情况)。

After the interview, I tried to do research online about coding and what features from standard lib caused large binary sizes and I could find basically nothing in regards to this.采访结束后,我尝试在网上进行有关编码的研究,以及标准库中的哪些功能导致二进制文件变大,但我基本上找不到任何相关信息。 Is there a way to quantify using certain features and the size impact they would have (without needing to code 100 smart pointers in a code base vs self managed for example).有没有办法量化使用某些特性和它们可能产生的大小影响(例如,无需在代码库中编写 100 个智能指针与自我管理)。

(Partially extracted from comments I wrote earlier) (部分摘自我之前写的评论)

I don't think there is a comprehensive answer.我不认为有一个全面的答案。 A lot also depends on the specific use case and needs to be judged on a case-by-case basis.很多还取决于具体的用例,需要根据具体情况来判断。

Templates模板

Templates may result in code bloat, yes, but they can also avoid it.模板可能会导致代码膨胀,是的,但它们也可以避免它。 If your alternative is introducing indirection through function pointers or virtual methods, then the templated function itself may become bigger in code size simply because function calls take several instructions and removes optimization potential.如果您的替代方案是通过函数指针或虚拟方法引入间接,那么模板化函数本身的代码大小可能会变得更大,这仅仅是因为函数调用需要多条指令并消除了优化潜力。

Another aspect where they can at least not hurt is when used in conjunction with type erasure.它们至少不会受到伤害的另一个方面是与类型擦除结合使用时。 The idea here is to write generic code, then put a small template wrapper around it that only provides type safety but does not actually emit any new code.这里的想法是编写通用代码,然后在其周围放置一个小型模板包装器,该包装器仅提供类型安全性,但实际上并不发出任何新代码。 Qt's QList is an example that does this to some extend. Qt 的 QList 就是一个在某种程度上做到这一点的例子。

This bare-bones vector type shows what I mean:这种简单的矢量类型说明了我的意思:

class VectorBase
{
protected:
    void** start, *end, *capacity;

    void push_back(void*);
    void* at(std::size_t i);
    void clear(void (*cleanup_function)(void*));
};

template<class T>
class Vector: public VectorBase
{
public:
    void push_back(T* value)
    { this->VectorBase::push_back(value); }

    T* at(std::size_t i)
    { return static_cast<T*>(this->VectorBase::at(i)); }

    ~Vector()
    { clear(+[](void* object) { delete static_cast<T*>(object); }); }
};

By carefully moving as much code as possible into the non-templated base, the template itself can focus on type-safety and to provide necessary indirections without emitting any code that wouldn't have been here anyway.通过小心地将尽可能多的代码移动到非模板基中,模板本身可以专注于类型安全并提供必要的间接性,而不会发出任何本来不会出现在这里的代码。

(Note: This is just meant as a demonstration of type erasure, not an actually good vector type) (注意:这只是作为类型擦除的演示,并不是一个真正好的向量类型)

Smart pointers智能指针

When written carefully, they won't generate much code that wouldn't be there anyway.仔细编写时,它们不会生成太多本来就不存在的代码。 Whether an inline function generates a delete statement or the programmer does it manually doesn't really matter.内联函数是生成删除语句还是程序员手动生成并不重要。

The main issue that I see with those is that the programmer is better at reasoning about code and avoiding dead code.我看到的主要问题是程序员更擅长推理代码和避免死代码。 For example even after a unique_ptr has been moved away, the destructor of the pointer still has to emit code.例如,即使在unique_ptr被移走之后,指针的析构函数仍然必须发出代码。 A programmer knows that the value is NULL, the compiler often doesn't.程序员知道该值为 NULL,而编译器通常不知道。

Another issue comes up with calling conventions.另一个问题是调用约定。 Objects with destructors are usually passed on the stack, even if you declare them pass-by-value.具有析构函数的对象通常在堆栈上传递,即使您声明它们是按值传递的。 Same for return values.返回值也一样。 So a function unique_ptr<foo> bar(unique_ptr<foo> baz) will have higher overhead than foo* bar(foo* baz) simply because pointers have to be put on and off the stack.因此,函数unique_ptr<foo> bar(unique_ptr<foo> baz)将比foo* bar(foo* baz)具有更高的开销,这仅仅是因为必须将指针放在堆栈上和堆栈外。

Even more egregiously, the calling convention used for example on Linux makes the caller clean up parameters instead of the callee.更令人震惊的是,例如在 Linux 上使用的调用约定使调用者清理参数而不是被调用者。 That means if a function accepts a complex object like a smart pointer by value, a call to the destructor for that parameter is replicated at every call site , instead of putting it once inside the function.这意味着如果一个函数通过值接受一个复杂的对象,如智能指针,则对该参数的析构函数的调用将在每个调用站点复制,而不是将其放在函数中一次。 Especially with unique_ptr this is so stupid because the function itself may know that the object has been moved away and the destructor is superfluous;尤其是使用unique_ptr时,这非常愚蠢,因为函数本身可能知道对象已被移走,而析构函数是多余的; but the caller doesn't know this (unless you have LTO).但调用者不知道这一点(除非你有 LTO)。

Shared pointers are a different beast altogether, simply because they allow a lot of different tradeoffs.共享指针完全是一种不同的野兽,仅仅是因为它们允许许多不同的权衡。 Should they be atomic?它们应该是原子的吗? Should they allow type casting, weak pointers, what indirection is used for destruction?他们应该允许类型转换,弱指针,什么间接用于破坏? Do you really need two raw pointers per shared pointer or can the reference counter be accessed through shared object?每个共享指针真的需要两个原始指针,还是可以通过共享对象访问引用计数器?

Exceptions, RTTI例外,RTTI

Generally avoided and removed via compiler flags.通常通过编译器标志避免和删除。

Library components库组件

On a bare-metal system, pulling in parts of the standard library can have a significant effect that can only be measured after the linker step.在裸机系统上,拉入标准库的部分内容可能会产生重大影响,只有在链接器步骤之后才能衡量。 I suggest any such project use continuous integration and tracks the code size as a metric.我建议任何此类项目都使用持续集成并将代码大小作为衡量标准。

For example I once added a small feature, I don't remember which, and in its error handling it used std::stringstream .例如,我曾经添加了一个小功能,我不记得是哪个,在它的错误处理中它使用了std::stringstream That pulled in the entire iostream library.这拉入了整个 iostream 库。 The resulting code exceeded my entire RAM and ROM capacity.结果代码超出了我的整个 RAM 和 ROM 容量。 IIRC the issue was that even though exception handling was deactivated, the exception message was still being set up. IIRC 的问题是,即使异常处理被停用,异常消息仍在设置中。

Move constructors and destructors移动构造函数和析构函数

It's a shame that C++'s move semantics aren't the same as for example Rust's where objects can be moved with a simple memcpy and then "forgetting" their original location.遗憾的是,C++ 的移动语义与例如 Rust 的移动语义不同,后者可以使用简单的 memcpy 移动对象,然后“忘记”它们的原始位置。 In C++ the destructor for a moved object is still invoked, which requires more code in the move constructor / move assignment operator, and in the destructor.在 C++ 中,移动对象的析构函数仍然被调用,这需要移动构造函数/移动赋值运算符和析构函数中的更多代码。

Qt for example accounts for such simple cases in its meta type system .例如,Qt 在其元类型系统中考虑了这种简单的情况。

This question probably deserves more attention than it's likely to get, especially for people trying to pursue a career in embedded systems.这个问题可能比它可能得到的更多关注,尤其是对于那些试图在嵌入式系统中谋求职业的人。 So far the discussion has gone about the way that I would expect, specifically a lot of conversation about the nuances of exactly how and when a project built with C++ might be more bloated than one written in plain C or a restricted C++ subset.到目前为止,讨论已经按照我所期望的方式进行,特别是关于使用 C++ 构建的项目究竟如何以及何时可能比用纯 C 或受限 C++ 子集编写的项目更臃肿的细微差别的讨论。

This is also why you can't find a definitive answer from a good old fashioned google search.这也是为什么您无法从老式的谷歌搜索中找到明确答案的原因。 Because if you just ask the question “is C++ more bloated than X?”, the answer is always going to be “it depends.”因为如果你只问“C++ 比 X 更臃肿吗?”,答案总是“视情况而定”。

So let me approach this from a slightly different angle.所以让我从一个稍微不同的角度来处理这个问题。 I've both worked for, and interviewed at companies that enforced these kinds of restrictions, I've even voluntarily enforced them myself.我曾在实施此类限制的公司工作过,也曾在公司面试过,我什至自己也自愿实施了这些限制。 It really comes down to this.它真的归结为这一点。 When you're running an engineering organization with more than one person with plans to keep hiring, it is wildly impractical to assume everyone on your team is going to fully understand the implications of using every feature of a language.当您管理一个工程组织时,计划继续招聘的不止一个人,假设您的团队中的每个人都将完全理解使用语言的每个功能的含义是非常不切实际的。 Coding standards and language restrictions serve as a cheap way to prevent people from doing “bad things” without knowing they're doing “bad things”.编码标准和语言限制是一种廉价的方法,可以防止人们在不知道自己在做“坏事”的情况下做“坏事”。

How you define a “bad thing” is then also context specific.您如何定义“坏事”也是特定于上下文的。 On a desktop platform, using lots of code space isn't really a “bad” enough thing to rigorously enforce.在桌面平台上,使用大量代码空间并不是真正“坏”到足以严格执行的事情。 On a tiny embedded system, it probably is.在一个微型嵌入式系统上,它可能是。

C++ by design makes it very easy for an engineer to generate lots of code without having to type it out explicitly . C++ 的设计使工程师可以很容易地生成大量代码,而无需显式输入 I think that statement is pretty self-evident, it's the whole point of meta-programming, and I doubt anyone would challenge it, in fact it's one of the strengths of the language.我认为这句话是不言而喻的,它是元编程的全部意义,我怀疑有人会挑战它,事实上它是语言的优势之一。

So then coming back to the organizational challenges, if your primary optimization variable is code space, you probably don't want to allow people to use features that make it trivial to generate code that isn't obvious.那么回到组织挑战,如果您的主要优化变量是代码空间,您可能不希望允许人们使用使生成不明显代码变得微不足道的功能。 Some people will use that feature responsibly and some people won't, but you have to standardize around the least common denominator.有些人会负责任地使用该功能,而有些人不会,但您必须围绕最小公分母进行标准化。 AC compiler is very simple. AC 编译器非常简单。 Yes you can write bloated code with it, but if you do, it will probably be pretty obvious from looking at it.是的,你可以用它编写臃肿的代码,但如果你这样做了,看它可能会很明显。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM