简体   繁体   English

在 C++17 和 C++20 之间可移植地使用 UTF-8 字符串文字前缀

[英]Using UTF-8 string-literal prefixes portably between C++17 and C++20

I have a codebase written in C++17 that makes heavy use of UTF-8, and theu8 string literal introduced in c++11 to indicate UTF encoding.我有一个用 C++17 编写的代码库,它大量使用 UTF-8 和c++11 中引入的u8字符串文字来指示 UTF 编码。 However, c++20 changes the meaning of what the u8 literal does in C++ from producing a char or const char* to a char8_t or const char8_t* ;但是,c++20 将u8文字在 C++的含义从产生charconst char*更改为char8_tconst char8_t* the latter of which is not implicitly pointer convertible to const char* .后者不是隐式指针可转换const char*

I'd like for this project to support operating in both C++17 and C++20 mode without breakages;我希望这个项目支持在 C++17 和 C++20 模式下运行而不会损坏; what can be done to support this?可以做些什么来支持这一点?


Currently, the project uses a char8 alias that uses the type-result of a u8 literal:目前,该项目使用char8别名,该别名使用u8文字的类型结果:

// Produces 'char8_t' in C++20, 'char' in anything earlier
using char8 = decltype(u8' ');

But there are a few problems with this approach:但是这种方法存在一些问题:

  1. char is not guaranteed to be unsigned, which makes producing codepoints from numeric values not portable (eg char8{129} breaks with char , but not with char8_t ). char不保证是无符号的,这使得从数值生成代码点不可移植(例如char8{129}char中断,但不与char8_t )。

  2. char8 is not distinct from char in C++17, which can break existing code, and may cause errors. char8与 C++17 中的char没有区别,可能会破坏现有代码,并可能导致错误。

  3. Continuing from point-2, it's not possible to overload char with char8 in C++17 to handle different encodings because they are not unique types.从第 2 点继续,不可能在 C++17 中用char8重载char来处理不同的编码,因为它们不是唯一的类型。

What can be done to support operating in both C++17 and C++20 mode, while avoiding the type-difference problem?可以做些什么来支持在 C++17 和 C++20 模式下运行,同时避免类型差异问题?

I would suggest simply declaring your own char8_t and u8string types in pre-C++20 versions to alias unsigned char and basic_string<unsigned char> .我建议在 C++20 之前的版本中简单地将您自己的char8_tu8string类型声明为别名unsigned charbasic_string<unsigned char> And then anywhere you run into conversion problems, you can write wrapper functions to handle them appropriately in each version.然后在遇到转换问题的任何地方,您都可以编写包装函数以在每个版本中适当地处理它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM