考虑缓存一致性的高性能应用程序的 POD 数学结构类的 C++ 选择通过值与通过引用

Question

For many high performance applications, such as game engines or financial software, considerations of cache coherency, memory layout, and cache misses are crucial for maintaining smooth performance.对于许多高性能应用程序，例如游戏引擎或金融软件，缓存一致性、内存布局和缓存未命中的考虑对于保持流畅的性能至关重要。 As the C++ standard has evolved, especially with the introduction of Move Semantics and C++14 , it has become less clear when to draw the line of pass by value vs. pass by reference for mathematical POD based classes.随着 C++ 标准的发展，特别是随着Move Semantics和C++14的引入，对于基于数学 POD 的类，何时绘制值传递与引用传递的界限变得不太清楚。

Consider the common POD Vector3 class:考虑常见的POD Vector3 类：

class Vector3
{
public:
   float32 x;
   float32 y;
   float32 z;
   // Implementation Functions below (all non-virtual)...
}

This is the most commonly used math structure in game development.这是游戏开发中最常用的数学结构。 It is a non-virtual , 12 byte size class, even in 64 bit since we are explicitly using IEEE float32, which uses 4 bytes per float.它是一个非虚拟的12 字节大小的类，即使是 64 位也是因为我们明确使用 IEEE float32，它每个浮点数使用 4 个字节。 My question is as follows - What is the general best practice guideline to use when deciding to pass POD mathematical classes by value or by reference for high performance applications?我的问题如下 -在决定为高性能应用程序按值或按引用传递 POD 数学类时，要使用的一般最佳实践指南是什么？

Some things for consideration when answering this question:回答这个问题时需要考虑的一些事项：

It is safe to assume the default constructor does not initialize any values假设默认构造函数不初始化任何值是安全的
It is safe to assume no arrays beyond 1D are used for any POD math structures假设没有超过 1D 的数组用于任何 POD 数学结构是安全的
Clearly most people pass 4-8 byte POD constants by value, so there doesn't seem to be much debate there显然，大多数人按值传递 4-8 字节的 POD 常量，因此似乎没有太多争论
What happens when this Vector is a class member variable vs a local variable on the stack?当这个 Vector 是类成员变量与堆栈上的局部变量时会发生什么？ If pass by reference is used, then it would use the memory address of the variable on the class vs a memory address of something local on the stack.如果使用按引用传递，那么它将使用类上变量的内存地址与堆栈上本地某物的内存地址。 Does this use-case matter?这个用例重要吗？ Could this difference where PBR is used result in more cache misses?使用 PBR 的这种差异是否会导致更多的缓存未命中？
What about the case where SIMD is used or not used?使用或不使用 SIMD 的情况如何？
What about move semantic compiler optimizations?移动语义编译器优化怎么样？ I have noticed that when switching to C++14, the compiler will often use move semantics when chain function calls are made passing the same vector by value, especially when it is const.我注意到当切换到 C++14 时，当链函数调用通过值传递相同的向量时，编译器通常会使用移动语义，尤其是当它是 const 时。 I observed this by perusing the assembly breakdown我通过仔细阅读装配分解来观察到这一点
When using pass by value and pass by reference with these math structures, does const make a much impact on compiler optimizations?当对这些数学结构使用传值和传引用时， const对编译器优化有很大影响吗？ See the above point看上面的点

Given the above, what is a good guideline for when to use pass by value vs pass by reference with modern C++ compilers (C++14 and above) to minimize cache misses and promote cache coherency?鉴于上述情况，对于现代 C++ 编译器（C++14 及更高版本）何时使用按值传递与按引用传递来最小化缓存未命中并促进缓存一致性的良好指南是什么？ At what point might someone say this POD math structure is too large for pass by value, such as a 4v4 affine transform matrix, which is 64 bytes in size assuming use of float32.在什么时候可能有人会说这个 POD 数学结构对于按值传递来说太大了，例如 4v4 仿射变换矩阵，假设使用 float32，它的大小为 64 字节。 Does the Vector, or rather any small POD math structure, declared on the stack vs. being referenced as a member variable matter when making this decision?在做出这个决定时，Vector，或者更确切地说是任何小的 POD 数学结构，在堆栈上声明与作为成员变量引用是否重要？

I am hoping someone can provide some analysis and insight to where a good modern guideline for best practices can be established for the above situation.我希望有人可以提供一些分析和见解，以了解可以针对上述情况建立最佳实践的良好现代指南。 I believe the line has become more blurry as for when to use PBV vs PBR for POD classes as the C++ standard has evolved, especially in regard to minimizing cache misses.我相信随着 C++ 标准的发展，关于何时对 POD 类使用 PBV 与 PBR 的界限变得更加模糊，特别是在最小化缓存未命中方面。

Answer 1

I see the question title is on the choice of pass-by-value vs. pass-by-reference, though it sounds like what you are after more broadly is the best practice to efficiently passing around 3D vectors and other common PODs.我看到问题标题是关于选择传递值还是传递引用，尽管听起来您更广泛地追求的是有效传递 3D 向量和其他常见 POD 的最佳实践。 Passing data is fundamental and intertwined with programming paradigm, so there isn't a consensus on the best way to do it.传递数据是基本的，并且与编程范式交织在一起，因此对于最好的方法并没有达成共识。 Besides performance, there are considerations to weigh like code readability, flexibility, and portability to decide which approach to favor in a given application.除了性能之外，还需要权衡代码可读性、灵活性和可移植性等考虑因素，以决定在给定应用程序中采用哪种方法。

That said, in recent years, "data-oriented design" has become a popular alternative to object-oriented programming, especially in video game development.也就是说，近年来， “面向数据的设计”已成为面向对象编程的流行替代品，尤其是在视频游戏开发中。 The essential idea is to think about the program in terms of data it needs to process, and how all that data can be organized in memory for good cache locality and computation performance.基本思想是根据需要处理的数据来考虑程序，以及如何在内存中组织所有这些数据以获得良好的缓存局部性和计算性能。 There was a great talk about it at CppCon 2014: "Data-Oriented Design and C++" by Mike Acton .在 CppCon 2014 上有一个很棒的讨论： Mike Acton 的“面向数据的设计和 C++” 。

With your Vector3 example for instance, it is often the case that a program has not just one but many 3D vectors that are all processed the same way, say, all undergo the same geometric transformation.以您的 Vector3 示例为例，通常情况下，程序不仅有一个，而且有许多 3D 矢量，它们都以相同的方式处理，例如，都经过相同的几何变换。 Data-oriented design suggests it is then a good idea to lay the vectors out in contiguously in memory and that they are all transformed together in a batch operation.面向数据的设计表明，在内存中连续排列向量是一个好主意，并且它们都在批处理操作中一起转换。 This improves caching and creates opportunities to leverage SIMD instructions.这改进了缓存并创造了利用 SIMD 指令的机会。 You could implement this example with the Eigen C++ linear algebra library .您可以使用Eigen C++ 线性代数库来实现此示例。 The vectors can be represented using a Eigen::Matrix<float, 3, Eigen::Dynamic> of shape 3xN to store N vectors, then manipulated using Eigen's SIMD-accelerated operations.可以使用形状为 3xN 的Eigen::Matrix<float, 3, Eigen::Dynamic>来表示向量，以存储 N 个向量，然后使用 Eigen 的 SIMD 加速操作进行操作。

考虑缓存一致性的高性能应用程序的 POD 数学结构类的 C++ 选择通过值与通过引用

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-08 05:26:09

考虑缓存一致性的高性能应用程序的 POD 数学结构类的 C++ 选择通过值与通过引用

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-08 05:26:09

解决方案1
1 已采纳 2020-09-08 05:26:09