简体   繁体   English

使用 struct 分块处理数组然后转换为平面数组 - 如何避免 UB(严格别名)?

[英]process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?

An external API expects a pointer to an array of values (int as simple example here) plus a size.外部 API 需要一个指向值数组(此处以 int 作为简单示例)加上大小的指针。

It is logically clearer to deal with the elements in groups of 4.以 4 为一组处理元素在逻辑上更清晰。

So process elements via a "group of 4" struct and then pass the array of those structs to the external API using a pointer cast.因此,通过“4 组”结构处理元素,然后使用指针转换将这些结构的数组传递给外部 API。 See code below.请参阅下面的代码。

Spider sense says: "strict aliasing violation" in the reinterpret_cast => possible UB? Spider sense 说: reinterpret_cast中的“严格违反别名”=> 可能是 UB?

  1. Are the static_asserts below enough to ensure: a) this works in practice b) this is actually standards compliant and not UB?下面的static_asserts是否足以确保:a) 这在实践中有效 b) 这实际上符合标准而不是 UB?

  2. Otherwise, what do I need to do, to make it "not UB".否则,我需要做什么才能使其“不是 UB”。 A union?工会? How exactly please?请问具体如何?

  3. or, is there overall a different, better way?或者,总体上有不同的更好的方法吗?


#include <cstddef>

void f(int*, std::size_t) {
    // external implementation
    // process array
}

int main() {

    static constexpr std::size_t group_size    = 4;
    static constexpr std::size_t number_groups = 10;
    static constexpr std::size_t total_number  = group_size * number_groups;

    static_assert(total_number % group_size == 0);

    int vals[total_number]{};

    struct quad {
        int val[group_size]{};
    };

    quad vals2[number_groups]{};
    // deal with values in groups of four using member functions of `quad`

    static_assert(alignof(int) == alignof(quad));
    static_assert(group_size * sizeof(int) == sizeof(quad));
    static_assert(sizeof(vals) == sizeof(vals2));

    f(vals, total_number);
    f(reinterpret_cast<int*>(vals2), total_number); /// is this UB? or OK under above asserts?
}

No amount of static_assert s is going to make something which is categorically UB into well-defined behavior in accord with the standard.再多的static_assert也无法将绝对UB 的东西变成符合标准的明确定义的行为。 You did not create an array of int s;您没有创建int数组; you created a struct containing an array of int s.您创建了一个包含int数组的结构。 So that's what you have.这就是你所拥有的。

It's legal to convert a pointer to a quad into a pointer to an int[group_size] (though you'll need to alter your code appropriately. Or you could just access the array directly and cast that to an int* .将指向quad的指针转换为指向int[group_size]的指针是合法的(尽管您需要适当地更改代码。或者您可以直接访问数组并将其转换为int*

Regardless of how you get a pointer to the first element, it's legal to do pointer arithmetic within that array.无论您如何获得指向第一个元素的指针,在该数组中进行指针运算都是合法的。 But the moment you try to do pointer arithmetic past the boundaries of the array within that quad object, you achieve undefined behavior.但是,您尝试在该quad对象内进行超出数组边界的指针运算时,您将获得未定义的行为。 Pointer arithmetic is defined based on the existence of an array: [expr.add]/4指针运算是基于数组的存在来定义的: [expr.add]/4

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.当整数类型的表达式 J 与指针类型的表达式 P 相加或相减时,结果的类型为 P。

  • If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.如果 P 的计算结果为空指针值,而 J 的计算结果为 0,则结果为空指针值。
  • Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.否则,如果 P 指向具有 n 个元素 ([dcl.array]) 的数组对象 x 的数组元素 i,则表达式 P + J 和 J + P(其中 J 的值为 j)指向(可能假设的) x 的数组元素 i+j 如果 0≤i+j≤n 并且表达式 P - J 指向(可能是假设的)x 的数组元素 i−j 如果 0≤i−j≤n。
  • Otherwise, the behavior is undefined.否则,行为未定义。

The pointer isn't null, so case 1 doesn't apply.指针不为空,因此情况 1 不适用。 The n above is group_size (because the array is the one within quad ), so if the index is > group_size , then case 2 doesn't apply.上面的ngroup_size (因为数组是quad中的一个),所以如果索引 > group_size ,那么情况 2 不适用。

Therefore, undefined behavior will happen whenever someone tries to access the array past index 4. There is no cast that can wallpaper over that.因此,每当有人试图访问索引 4 之后的数组时,就会发生未定义的行为。没有可以覆盖它的强制转换。


Otherwise, what do I need to do, to make it "not UB".否则,我需要做什么才能使其“不是 UB”。 A union?工会? How exactly please?请问具体如何?

You don't.你不知道。 What you're trying to do is simply not valid with respect to the C++ object model.您尝试做的事情对于 C++ 对象模型来说根本无效。 You need an array of int s, so you must create an array of int s.您需要一个int数组,因此您必须创建一个int数组。 You cannot treat an array of something other than int s as an array of int s (well, with minor exceptions of byte-wise arrays, but that's unhelpful to you).您不能将int int数组(好吧,字节数组除外,但这对您没有帮助)。


The simplest valid way to process the array in groups is to just... do some nested loops:分组处理数组的最简单有效方法是......做一些嵌套循环:

int arr[total_number];
for(int* curr = arr; curr != std::end(arr); curr += 4)
{
  //Use `curr[0]` to `curr[3]`;
  //Or create a `std::span<int, 4> group(curr)`;
}

No, this is not permitted.不,这是不允许的。 The relevant C++ standard section is §7.6.1.10 .相关的 C++ 标准部分是§7.6.1.10 From the first paragraph, we have (emphasis mine)从第一段开始,我们有(强调我的)

The result of the expression reinterpret_cast<T>(v) is the result of converting the expression v to type T .表达式reinterpret_cast<T>(v)的结果是将表达式v转换为类型T的结果。 If T is an lvalue reference type or an rvalue reference to function type, the result is an lvalue;如果T是左值引用类型或函数类型的右值引用,则结果为左值; if T is an rvalue reference to object type, the result is an xvalue;如果T是对对象类型的右值引用,则结果是一个 xvalue; otherwise, the result is a prvalue and the lvalue-to-rvalue, array-to-pointer, and function-to-pointer standard conversions are performed on the expression v .否则,结果为纯右值,并且对表达式v执行左值到右值、数组到指针和函数到指针的标准转换。 Conversions that can be performed explicitly using reinterpret_cast are listed below.下面列出了可以使用 reinterpret_cast 显式执行的转换。 No other conversion can be performed explicitly using reinterpret_cast.不能使用 reinterpret_cast 显式执行其他转换。

So unless your use case is listed on that particular page, it's not valid.因此,除非您的用例列在该特定页面上,否则它是无效的。 Most of the sections are not relevant to your use case, but this is the one that comes closest.大多数部分与您的用例无关,但这是最接近的部分。

An object pointer can be explicitly converted to an object pointer of a different type.[58]对象指针可以显式转换为不同类型的对象指针。 [58] When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T ”, the result is static_cast<cv T*>(static_cast<cv void*>(v)) .当对象指针类型的纯右值v转换为对象指针类型“指向cv T的指针”时,结果为static_cast<cv T*>(static_cast<cv void*>(v))

So a reinterpret_cast from one pointer type to another is equivalent to a static_cast through an appropriately cv-qualified void* .因此,从一种指针类型到另一种指针类型的reinterpret_cast等效于通过适当的 cv 限定的void*进行的static_cast Now, a static_cast that goes from T* to S* can be acceptably used as a S* if the types T and S are pointer-interconvertible .现在,如果类型TSpointer-interconvertible ,那么从T*S*static_cast可以被用作S* From §6.8.4§6.8.4

Two objects a and b are pointer-interconvertible if:如果满足以下条件,则两个对象 a 和 b 是指针可相互转换的:

  • they are the same object, or它们是同一个对象,或者
  • one is a union object and the other is a non-static data member of that object ([class.union]), or一个是联合对象,另一个是该对象的非静态数据成员 ([class.union]),或者
  • one is a standard-layout class object and the other is the first non-static data member of that object or any base class subobject of that object ([class.mem]), or一个是标准布局类对象,另一个是该对象的第一个非静态数据成员或该对象的任何基类子对象 ([class.mem]),或者
  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.存在一个对象 c 使得 a 和 c 是指针可相互转换的,并且 c 和 b 是指针可相互转换的。

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast ([expr.reinterpret.cast]).如果两个对象是指针可相互转换的,则它们具有相同的地址,并且可以通过 reinterpret_cast ([expr.reinterpret.cast]) 从指向另一个的指针获得指向一个的指针。

[Note 4: An array object and its first element are not pointer-interconvertible, even though they have the same address. [注意 4:数组对象和它的第一个元素不是指针可相互转换的,即使它们具有相同的地址。 — end note] ——尾注]

To summarize, you can cast a pointer to a class C to a pointer to its first member (and back) if there's no vtable to stop you.总而言之,如果没有 vtable 阻止您,您可以将指向类C的指针转换为指向其第一个成员的指针(并返回)。 You can cast a pointer to C into another pointer to C (that can come up if you're adding cv-qualifiers; for instance, reinterpret_cast<const C*>(my_c_ptr) is valid if my_c_ptr is C* ).您可以将一个指向C的指针转换为另一个指向C的指针(如果您要添加cv 限定符,就会出现这种情况;例如,如果my_c_ptrC* ,则reinterpret_cast<const C*>(my_c_ptr)有效)。 There are also some special rules for unions, which don't apply here.工会还有一些特殊规定,这里不适用。 However , you can't factor through arrays, as per Note 4. The conversion you want here is quad[] -> quad -> int -> int[] , and you can't convert between the quad[] and the quad .但是,您不能按照注释 4 对数组进行因式分解。您在这里想要的转换是quad[] -> quad -> int -> int[] ,并且不能在quad[]quad之间转换. If quad was a simple struct that contained only an int , then you could reinterpret a quad* as an int* , but you can't do it through arrays, and certainly not through a nested layer of them.如果quad是一个包含int的简单结构,那么您可以将quad*重新解释为int* ,但您不能通过数组来完成,当然也不能通过它们的嵌套层来完成。

None of the sections I've cited say anything about alignment.我引用的所有部分都没有提到对齐。 Or size.或大小。 Or packing.或者包装。 Or padding.或填充。 None of that matters.这些都不重要。 All your static_assert s are doing is slightly increasing the probability that the undefined behavior (which is still undefined) will happen to work on more compilers.您所有的static_assert所做的只是略微增加了未定义行为(仍然未定义)发生在更多编译器上的可能性。 But you're using a bandaid to repair a dam;但是您正在使用创可贴来修复水坝; it's not going to work.这是行不通的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在不违反“严格别名”的情况下使用 std::array&amp; 分配 C 样式数组的一个子部分并因此调用 UB? - Assigning a subsection of C-style array using a std::array& without violating "strict aliasing" and hence invoking UB? 使用aligned_storage时如何避免严格的别名错误 - How to avoid strict aliasing errors when using aligned_storage 访问POD结构数组作为其单个成员的数组是否违反了严格的别名? - Does accessing array of POD struct as array of its single member violate strict aliasing? 如何创建一个不破坏严格别名的uint8_t数组? - How to create an uint8_t array that does not undermine strict aliasing? 如何在不违反严格别名规则的情况下解析字节数组? - How to parse byte array without violating strict aliasing rule? 为什么编译器不再使用严格的别名来优化此UB - Why compilers no longer optimize this UB with strict aliasing 使用new char []或malloc的结果来表示浮动*是UB(严格别名冲突)吗? - Is using the result of new char[] or malloc to casted float * is UB (strict aliasing violation)? 在遵守严格的别名规则的同时使用 reinterpret_cast - Using reinterpret_cast while respecting the strict aliasing rule 有没有办法避免严格的别名警告? - Is there a way to avoid the strict aliasing warning? 严格别名和std :: array vs C风格的数组 - Strict aliasing and std::array vs C-style array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM