简体   繁体   English

使用 std::filesystem::path 将 UTF8 转换为 UTF16

[英]UTF8 to UTF16 conversion using std::filesystem::path

Starting from C++11 one can convert UTF8 to UTF16 wchar_t (at least on Windows, where wchar_t is 16 bit wide) using std::codecvt_utf8_utf16 :从 C++11 开始,可以使用std::codecvt_utf8_utf16将 UTF8 转换为 UTF16 wchar_t (至少在 Windows,其中wchar_t为 16 位宽):

std::wstring utf8ToWide( const char* utf8 )
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
    return converter.from_bytes( utf8 );
}

Unfortunately in C++17, std::codecvt_utf8_utf16 is deprecated.不幸的是,在 C++17 中, std::codecvt_utf8_utf16已被弃用。 But there is std::filesystem::path with all possible conversions inside, eg it has members但是有std::filesystem::path里面有所有可能的转换,例如它有成员

std::string string() const;
std::wstring wstring() const;
std::u8string u8string() const;
std::u16string u16string() const;
std::u32string u32string() const;

So the above function can be rewritten as follows:所以上面的function可以改写如下:

std::wstring utf8ToWide( const char* utf8 )
{
    return std::filesystem::path( (const char8_t*) utf8 ).wstring();
}

And unlike std::codecvt_utf8_utf16 this will not use any deprecated piece of C++.std::codecvt_utf8_utf16不同,它不会使用任何已弃用的 C++。

What kind of drawbacks can be expected from such converter?这种转换器可以预期什么样的缺点? For example, path cannot be longer than certain length or certain Unicode symbols are prohibited there?例如,路径不能超过一定长度,或者某些 Unicode 符号在那里被禁止?

What kind of drawbacks can be expected from such converter?这种转换器可以预期什么样的缺点?

Well, let's get the most obvious drawback out of the way.好吧,让我们解决最明显的缺点。 For a user who doesn't know what you're doing, it makes no sense.对于一个不知道你在做什么的用户来说,这是没有意义的。 Doing UTF-8-to-16 conversion by using a path type is bonkers, and should be seen immediately as a code smell.使用路径类型进行 UTF-8 到 16 的转换是很糟糕的,应该立即被视为代码异味。 It's the kind of awful hack you do when you are needlessly averse to just downloading a simple library that would do it correctly.当您不必要地反对只下载一个可以正确执行此操作的简单库时,您会做这种可怕的黑客攻击。

Also, it doesn't have to work.此外,它不必工作。 path is meant for storing... paths. path用于存储...路径。 Hence the name.由此得名。 Specifically, they're meant for storing paths in a way easily consumed by the filesystem in question.具体来说,它们旨在以一种易于被相关文件系统使用的方式存储路径。 As such, the string stored in a path can have any limitations that the filesystem wants to put on it, outside of a small plethora of things the C++ standard requires it to do.因此,存储在path中的字符串可以具有文件系统想要对其施加的任何限制,除了 C++ 标准要求它做的大量事情之外。

For example, if the filesystem is case-insensitive (or even just ASCII-case-insensitive), it is a legitimate implementation to have it just case-convert all strings to lowercase when they are stored in a path .例如,如果文件系统不区分大小写(甚至只是不区分 ASCII 大小写),那么当所有字符串存储在path中时,将其大小写转换为小写是一种合法的实现。 Or to case-convert them when you extract them from a path .或者从path中提取它们时对它们进行大小写转换。 Or anything of the like.或类似的东西。

path can convert all of your \ s into / s. path可以将你所有的\ s转换成/ s。 Or your : s into / 's.或者你的: s 变成/ s。 Or any other implementation-dependent tricks it wants to do.或者它想要做的任何其他依赖于实现的技巧。

If you're afraid of using a deprecated facility, just download a simple UTF-8/16 converting library.如果您害怕使用已弃用的工具,只需下载一个简单的 UTF-8/16 转换库。 Or write one yourself;或者自己写一个; it isn't that difficult.这并不难。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM