在编译时初始化非常大的 C++ std::bitset

Question

I want to store a static constant bitset of 2 ¹⁶ bits, with a specific sequence of 1s and 0s that never changes.我想存储 2^{个 16}位的 static 常量位集，其中 1 和 0 的特定序列永远不会改变。

I thought of using an initializer string as proposed by this post :我想过使用这篇文章提出的初始化字符串：

std::bitset<1<<16> myBitset("101100101000110 ... "); // the ellipsis are replaced by the actual 65536-character sequence

But the compiler (VS2013) gives me the "string too long" error.但是编译器（VS2013）给了我“字符串太长”的错误。

UPDATE更新

I tried splitting the string into smaller chunks, as proposed in the post linked above, like so:我尝试按照上面链接的帖子中的建议将字符串拆分成更小的块，如下所示：

std::bitset<1<<16> myBitset("100101 ..."
                            "011001 ..."
                            ...
                            );

But I get the error C1091: compiler limit: string exceeds 65535 bytes in length .但我收到错误C1091: compiler limit: string exceeded 65535 bytes in length 。 My string is 65536 bytes (well technically 65537, with the EOS character).我的字符串是 65536 字节（技术上是 65537，带有 EOS 字符）。

What are my other options?我还有哪些其他选择？

UPDATE更新

Thanks to luk32 , this is the beautiful code I ended up with:感谢luk32 ，这是我最终得到的漂亮代码：

const std::bitset<1<<16> bs = (std::bitset<1<<16>("101011...")
    << 7* (1<<13)) | (std::bitset<1<<16>("110011...")
    << 6* (1<<13)) | (std::bitset<1<<16>("101111...")
    << 5* (1<<13)) | (std::bitset<1<<16>("110110...")
    << 4* (1<<13)) | (std::bitset<1<<16>("011011...")
    << 3* (1<<13)) | (std::bitset<1<<16>("111011...")
    << 2* (1<<13)) | (std::bitset<1<<16>("111001...")
    << 1* (1<<13)) | std::bitset<1<<16>("1100111...");

Answer 1

You didn't really split the literal.您并没有真正拆分文字。 It gets concatenated for compilation anyways.无论如何，它都会被连接起来进行编译。 You are getting limited by the compiler.您受到编译器的限制。 I don't think there's a way to increase this limit in MSVC.我认为没有办法在 MSVC 中增加这个限制。

You can split it into two literals, initialize two bitsets, shift 1st part and OR with the other.您可以将其拆分为两个文字，初始化两个位集，移动第一部分并与另一部分进行OR 。

Something like:就像是：

#include <iostream>
#include <string>
#include <bitset>

 
using namespace std;
int main()
{
    std::bitset<8> dest("0110");
    std::bitset<8> lowBits("1001");

    dest <<= dest.size()/2;
    dest |= lowBits;
    std::cout << dest << '\n';
}

If you look at the clang compiler output at -02 , it gets optimized to loading 105 which is 01101001 .如果您查看-02处的 clang 编译器 output ，它会优化为加载105 ，即01101001 。

My testing shows that if you swap 8 for 1<<16 it uses SSE, so it should be pretty safe bet.我的测试表明，如果你将8换成1<<16 ，它使用 SSE，所以它应该是相当安全的赌注。 It didn't drop the literals like in case of 8 or 16 , so there might be some runtime overhead, but I am not sure if you can do much better.它没有像8或16那样丢弃文字，因此可能会有一些运行时开销，但我不确定您是否可以做得更好。

EDIT:编辑：

I did some more tests, here is my playground :我做了更多测试，这是我的游乐场：

#include <iostream>
#include <string>
#include <bitset>
 

using namespace std;
int main()
{
    //static const std::bitset<16> set1( "01100110011001100110011001100110");
    static const std::bitset<16> set2(0b01100110011001100110011001100110);

    static const std::bitset<16> high(0b01100110);
    static const std::bitset<16> low (0b01100110);
    static const std::bitset<16> set3 = (high << 8) | low;
    std::cout << (set3 == set2) << '\n';
}

I couldn't get compile time optimization for const char* constructor on any compiler except for clang, and that worked up to 14 characters.除了 clang 之外，我无法在任何编译器上对const char*构造函数进行编译时优化，并且最多可以使用 14 个字符。 There seems to be some promise if you make a bunch of bitset s initialized from unsigned long long and shift and combine them together:如果您从unsigned long long初始化一堆bitset并移位并将它们组合在一起，则似乎有一些 promise ：

static const std::bitset<128> high(0b0110011001100110011001100110011001100110011001100110011001100110);
static const std::bitset<128> low (0b1001100110011001100110011001100110011001100110011001100110011001);
static const std::bitset<128> set3 = (high << high.size()/2) | low;
std::cout << set3 << '\n';

This makes compilers to stick to binary data storage.这使得编译器坚持二进制数据存储。 If could use a bit newer compiler with constexpr I think it would be possible to declare it as an array of bitset s constructed from ull s and have them concatenated by a constexpr function and bound to a constexpr const variable, which should ensure best optimization possible.如果可以使用带有constexpr的较新编译器，我认为可以将其声明为从ull构造的bitset数组，并通过constexpr function 将它们连接起来并绑定到constexpr const变量，这应该可以确保最佳优化. Compiler still could go against you, but there would be no reason.编译器仍然可以 go 对你不利，但没有理由。 Maybe even without constexpr it would generate pretty much optimal code.也许即使没有constexpr它也会生成非常优化的代码。

Answer 2

You may consider skipping compilation altogether, and simply:您可以考虑完全跳过编译，并且简单地：

Assemble the data into an object file (segment .rodata ), exporting symbols for it and its size.将数据组装成 object 文件（段.rodata ），为其导出符号及其大小。
Declaring these symbols as extern const in a .h file.在.h文件中将这些符号声明为extern const 。
Use these symbols and link your program to this object file.使用这些符号并将您的程序链接到此 object 文件。

I don't have MASM32 handy to write a complete answer that actually works, but I use this technique often with GAS and LD and it culls a lot of issues.我没有方便的 MASM32 来编写一个实际有效的完整答案，但我经常将这种技术与 GAS 和 LD 一起使用，它可以解决很多问题。 (loading-on-demand, security descriptors of an otherwise separate data file, blazingly fast compile times...) （按需加载，其他单独数据文件的安全描述符，极快的编译时间......）

Note that this is what the VS resource compiler does, in short... so you may include your data as a resource and get a pointer to it.请注意，这就是 VS 资源编译器所做的，简而言之......所以您可以将数据作为资源包含并获取指向它的指针。

Answer 3

It's impossible to have a static std::bitset like that because:不可能有这样的static std::bitset因为：

There's no constexpr support for the constructor receiving const char*接收const char*的构造函数不支持constexpr
VS 2013 is extremely old and doesn't even support constexpr . VS 2013 非常旧，甚至不支持constexpr 。 It only has partial C++11 support它仅支持部分 C++11

In case construction at runtime is allowed then simply split the string literal into multiple smaller ones less than 2048 characters in case the total length is smaller than 65536:如果允许在运行时构造，则只需将字符串文字拆分为多个小于 2048 个字符的较小字符，以防总长度小于 65536：

ANSI compatibility requires a compiler to accept up to 509 characters in a string literal after concatenation. ANSI 兼容性要求编译器在连接后接受最多 509 个字符串文字。 The maximum length of a string literal allowed in Microsoft C is approximately 2,048 bytes. Microsoft C 中允许的字符串文字的最大长度约为 2,048 字节。 However, if the string literal consists of parts enclosed in double quotation marks, the preprocessor concatenates the parts into a single string, and for each line concatenated, it adds an extra byte to the total number of bytes.但是，如果字符串文字由用双引号括起来的部分组成，则预处理器会将这些部分连接成一个字符串，并且对于连接的每一行，它会在总字节数中添加一个额外的字节。

[...] [...]

While an individual quoted string cannot be longer than 2048 bytes, a string literal of roughly 65535 bytes can be constructed by concatenating strings.虽然单个带引号的字符串不能超过 2048 字节，但可以通过连接字符串来构造大约 65535 字节的字符串文字。

https://docs.microsoft.com/en-us/cpp/c-language/maximum-string-length?view=msvc-160 https://docs.microsoft.com/en-us/cpp/c-language/maximum-string-length?view=msvc-160

As said, longer strings must be concatenated manually.如前所述，较长的字符串必须手动连接。 Here这里

const int LENGTH = 1 << 16;
std::bitset<LENGTH> myBitset(
    "100101 ..."  // 2ᴺ bits
    "011001 ..."  // 2ᴺ bits
    ...
    "001011 ...", // must be one shorter than the previous lines: 2ᴺ⁻¹ bits
    LENGTH - 1    // size
);
myBitset[LENGTH - 1] = 1; // set the final bit

Alternatively just use an array instead of string literal:或者，只需使用数组而不是字符串文字：

static const char BITSET[LENGTH] = {
    '1', '0', '0', '1',...
    ...
    '0', '1', '0', '0'
};
std::bitset<LENGTH> myBitset(BITSET, sizeof(BITSET));

在编译时初始化非常大的 C++ std::bitset

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-04-28 13:35:19

解决方案2
0 2021-04-28 14:11:21

解决方案3
0 2021-04-28 14:42:21

在编译时初始化非常大的 C++ std::bitset

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-04-28 13:35:19

解决方案2 0 2021-04-28 14:11:21

解决方案3 0 2021-04-28 14:42:21

解决方案1
1 已采纳 2021-04-28 13:35:19

解决方案2
0 2021-04-28 14:11:21

解决方案3
0 2021-04-28 14:42:21