[英]C++ choice of pass by value vs pass by reference for POD math structure classes for high performance applications considering cache coherency
For many high performance applications, such as game engines or financial software, considerations of cache coherency, memory layout, and cache misses are crucial for maintaining smooth performance.对于许多高性能应用程序,例如游戏引擎或金融软件,缓存一致性、内存布局和缓存未命中的考虑对于保持流畅的性能至关重要。 As the C++ standard has evolved, especially with the introduction of Move Semantics and C++14 , it has become less clear when to draw the line of pass by value vs. pass by reference for mathematical POD based classes.
随着 C++ 标准的发展,特别是随着Move Semantics和C++14的引入,对于基于数学 POD 的类,何时绘制值传递与引用传递的界限变得不太清楚。
Consider the common POD Vector3 class:考虑常见的POD Vector3 类:
class Vector3
{
public:
float32 x;
float32 y;
float32 z;
// Implementation Functions below (all non-virtual)...
}
This is the most commonly used math structure in game development.这是游戏开发中最常用的数学结构。 It is a non-virtual , 12 byte size class, even in 64 bit since we are explicitly using IEEE float32, which uses 4 bytes per float.
它是一个非虚拟的12 字节大小的类,即使是 64 位也是因为我们明确使用 IEEE float32,它每个浮点数使用 4 个字节。 My question is as follows - What is the general best practice guideline to use when deciding to pass POD mathematical classes by value or by reference for high performance applications?
我的问题如下 -在决定为高性能应用程序按值或按引用传递 POD 数学类时,要使用的一般最佳实践指南是什么?
Some things for consideration when answering this question:回答这个问题时需要考虑的一些事项:
Given the above, what is a good guideline for when to use pass by value vs pass by reference with modern C++ compilers (C++14 and above) to minimize cache misses and promote cache coherency?鉴于上述情况,对于现代 C++ 编译器(C++14 及更高版本)何时使用按值传递与按引用传递来最小化缓存未命中并促进缓存一致性的良好指南是什么? At what point might someone say this POD math structure is too large for pass by value, such as a 4v4 affine transform matrix, which is 64 bytes in size assuming use of float32.
在什么时候可能有人会说这个 POD 数学结构对于按值传递来说太大了,例如 4v4 仿射变换矩阵,假设使用 float32,它的大小为 64 字节。 Does the Vector, or rather any small POD math structure, declared on the stack vs. being referenced as a member variable matter when making this decision?
在做出这个决定时,Vector,或者更确切地说是任何小的 POD 数学结构,在堆栈上声明与作为成员变量引用是否重要?
I am hoping someone can provide some analysis and insight to where a good modern guideline for best practices can be established for the above situation.我希望有人可以提供一些分析和见解,以了解可以针对上述情况建立最佳实践的良好现代指南。 I believe the line has become more blurry as for when to use PBV vs PBR for POD classes as the C++ standard has evolved, especially in regard to minimizing cache misses.
我相信随着 C++ 标准的发展,关于何时对 POD 类使用 PBV 与 PBR 的界限变得更加模糊,特别是在最小化缓存未命中方面。
I see the question title is on the choice of pass-by-value vs. pass-by-reference, though it sounds like what you are after more broadly is the best practice to efficiently passing around 3D vectors and other common PODs.我看到问题标题是关于选择传递值还是传递引用,尽管听起来您更广泛地追求的是有效传递 3D 向量和其他常见 POD 的最佳实践。 Passing data is fundamental and intertwined with programming paradigm, so there isn't a consensus on the best way to do it.
传递数据是基本的,并且与编程范式交织在一起,因此对于最好的方法并没有达成共识。 Besides performance, there are considerations to weigh like code readability, flexibility, and portability to decide which approach to favor in a given application.
除了性能之外,还需要权衡代码可读性、灵活性和可移植性等考虑因素,以决定在给定应用程序中采用哪种方法。
That said, in recent years, "data-oriented design" has become a popular alternative to object-oriented programming, especially in video game development.也就是说,近年来, “面向数据的设计”已成为面向对象编程的流行替代品,尤其是在视频游戏开发中。 The essential idea is to think about the program in terms of data it needs to process, and how all that data can be organized in memory for good cache locality and computation performance.
基本思想是根据需要处理的数据来考虑程序,以及如何在内存中组织所有这些数据以获得良好的缓存局部性和计算性能。 There was a great talk about it at CppCon 2014: "Data-Oriented Design and C++" by Mike Acton .
在 CppCon 2014 上有一个很棒的讨论: Mike Acton 的“面向数据的设计和 C++” 。
With your Vector3 example for instance, it is often the case that a program has not just one but many 3D vectors that are all processed the same way, say, all undergo the same geometric transformation.以您的 Vector3 示例为例,通常情况下,程序不仅有一个,而且有许多 3D 矢量,它们都以相同的方式处理,例如,都经过相同的几何变换。 Data-oriented design suggests it is then a good idea to lay the vectors out in contiguously in memory and that they are all transformed together in a batch operation.
面向数据的设计表明,在内存中连续排列向量是一个好主意,并且它们都在批处理操作中一起转换。 This improves caching and creates opportunities to leverage SIMD instructions.
这改进了缓存并创造了利用 SIMD 指令的机会。 You could implement this example with the Eigen C++ linear algebra library .
您可以使用Eigen C++ 线性代数库来实现此示例。 The vectors can be represented using a
Eigen::Matrix<float, 3, Eigen::Dynamic>
of shape 3xN to store N vectors, then manipulated using Eigen's SIMD-accelerated operations.可以使用形状为 3xN 的
Eigen::Matrix<float, 3, Eigen::Dynamic>
来表示向量,以存储 N 个向量,然后使用 Eigen 的 SIMD 加速操作进行操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.