简体   繁体   English

有没有办法在 constexpr/consteval 上下文中访问已知大小的任意数据作为 char 数组?

[英]Is there any way of accessing arbitrary data of known size as a char array in a constexpr/consteval context?

I'm trying to implement something that will take in arbitrary bits of data (which are known at compile time) and calculate their CRC as a consteval , so I can use it to eg index such data with integer keys without any runtime overhead.我正在尝试实现一些可以接收任意数据位(在编译时已知)并将它们的 CRC 计算为consteval的东西,因此我可以使用它来例如使用 integer 键索引此类数据,而无需任何运行时开销。 I have it working when the input is a char string literal, but I'm struggling to make it work when the input is a wchar_t string literal.当输入是 char 字符串文字时,我让它工作,但是当输入是wchar_t字符串文字时,我正在努力让它工作。

I'm getting a fairly cryptic error...我收到一个相当神秘的错误......

error: accessing value of '"T\000e\000s\000t\000\000"' through a 'const char' glvalue in a constant expression

... which seems to be caused by using reinterpret_cast within a constexpr context (which is apparently not allowed) ...这似乎是由在 constexpr 上下文中使用 reinterpret_cast 引起的(这显然是不允许的)

My question is, is there any way of interpreting arbitrary data as a plain old array of bytes anyway?我的问题是,有没有办法将任意数据解释为普通的旧字节数组? I don't care how ugly or lacking in portability it is (as long as it all happens at compile time).我不在乎它有多丑陋或缺乏可移植性(只要这一切都发生在编译时)。 For now, just solving the case with an array of wchar_t as input would be enough.现在,只需使用wchar_t数组作为输入来解决这个问题就足够了。 Obviously, I could "just" reimplement the CRC calculations for each type I want to handle separately, but I would rather not do that if at all possible (and indeed it would be quite tricky for anything more complex than an array of POD)显然,我可以“只是”为我想单独处理的每种类型重新实现 CRC 计算,但如果可能的话,我宁愿不这样做(事实上,对于比 POD 数组更复杂的任何事情,这将非常棘手)

For reference, the failing code is as follows:作为参考,失败的代码如下:

// Details of CRCInternal omitted for brevity
template <size_t len> consteval uint32_t CRC32(const char (&str)[len])
{
    return CRCInternal::crc32<len - 1>(str) ^ 0xFFFFFFFFu;
}

template <size_t len> consteval uint32_t CRC32FromWide(const wchar_t (&filename)[len])
{
    return CRC32(reinterpret_cast<const char(&)[len * sizeof(wchar_t)]>(filename));
}

void main()
{
    CRC32FromWide(L"Test"); // <==== Error
}

The C++ object model is usually a fiction, an agreement between the programmer writing the code and the compiler generating the binary executable. C++ object model 通常是虚构的,是编写代码的程序员和生成二进制可执行文件的编译器之间的协议。 To the executable, objects don't exist;对于可执行文件,对象不存在; it's just bits stored in memory.它只是存储在 memory 中的位。 As such, you can exploit the fact that C++ has dozens of back-doors that can be used to effectively pretend that the object model isn't real.因此,您可以利用 C++ 有数十个后门这一事实,这些后门可用于有效地假装 object model 不是真实的。 Many of these are stated to exhibit undefined behavior, but no compiler is going to check for these violations of the object model and stop you.其中许多被声明为表现出未定义的行为,但没有编译器会检查这些违反 object model 的行为并阻止您。 You broke your end of the contract, but the compiler wasn't paying attention, so you get away with it.你违反了合同,但编译器没有注意,所以你侥幸逃脱。

This is not the case in constant expression evaluation.这不是常量表达式求值的情况。 A compiled executable runs on the CPU;编译后的可执行文件在 CPU 上运行; constant expression evaluation runs within the compiler.常量表达式求值编译器中运行。 The object model doesn't have to map to "bits" or "memory" or anything like it; object model 不需要 map 到“位”或“内存”或类似的东西; it can be a real object model with full lifetime tracking and analysis.它可以是真正的 object model,具有全生命周期跟踪和分析功能。

The C++ standard therefore requires that, during constant evaluation, if you do anything that exhibits UB, the compiler must detect this and declare your program ill-formed.因此,C++ 标准要求,在持续评估期间,如果您执行任何显示 UB 的操作,编译器必须检测到这一点并声明您的程序格式错误。 Also, constexpr code is just flat-out forbidden from using the biggest back-door of all: reinterpret_cast .此外, constexpr 代码完全禁止使用最大的后门: reinterpret_cast

At compile-time, objects aren't bytes in storage.在编译时,对象不是存储中的字节。 So you don't get to treat them as if they were.所以你不能像对待他们一样对待他们。

This is especially important because the execution environment of the compiler and the execution environment of the eventual binary don't have to be the same .这一点尤其重要,因为编译器的执行环境和最终二进制文件的执行环境不必相同 If you're doing development for some embedded system, the endian of the CPU you're targeting may not match the endian of the CPU that your compiler executes on.如果您正在为某些嵌入式系统进行开发,那么您所针对的 CPU 的字节序可能与编译器在其上执行的 CPU 的字节序不匹配。 So if you were able to access any compile-time data as just bytes, you'd get a different answer at compile-time than you would at runtime.因此,如果您能够以字节的形式访问任何编译时数据,那么您在编译时会得到与运行时不同的答案。

That's bad.那很糟。

C++20's std::bit_cast exists and can help, but even that can't do everything. C++20 的std::bit_cast存在并且可以提供帮助,但即使这样也不能做任何事情。 A type is only suitable for constexpr bit_cast -ing if it is TriviallyCopyable and does not store pointers (among other things).如果类型是 TriviallyCopyable并且不存储指针(除其他外),则该类型仅适用于constexpr bit_cast -ing。 This is because compile-time pointers aren't just addresses;这是因为编译时指针不仅仅是地址; they're some complex data type that has to remember what object it points to (otherwise, it would be impossible to detect when you static_cast them to some unrelated type and attempt to access the object through the wrong type).它们是一些复杂的数据类型,必须记住它指向的static_cast (否则,当您将它们静态转换为一些不相关的类型并尝试通过错误的类型访问 object 时,将无法检测到)。

But if you restrict your types to those which are constexpr bit_cast able, then you can bit_cast them to an array of their size.但是,如果您将类型限制为可以使用constexpr bit_cast的类型,则可以将它们bit_cast为与其大小相同的数组。

Note that constexpr bit_cast is not the easiest thing to implement precisely because it has to make the source object data work as if it were executing on the target CPU and environment, not the one the compiler is executing within.请注意, constexpr bit_cast并不是最容易实现的东西,因为它必须使源 object 数据像在目标 CPU 和环境上执行一样工作,而不是编译器正在其中执行的环境。 So if the target is a big-endian machine and the source is little-endian, constexpr bit_cast must do endian conversion, and it must do such conversion with the specific knowledge of what each component type of the source and destination objects are.所以如果目标是big-endian机器而源是little-endian, constexpr bit_cast必须做endian转换,而且必须在知道source和destination对象的每个组件类型是什么的情况下做这种转换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM