[英]Why does `std::basic_ifstream<char16_t>` not work in c++11?
The following code works as expected.以下代码按预期工作。 The source code, file "file.txt" and "out.txt" are all encoded with utf8.源代码,文件“file.txt”和“out.txt”都是用utf8编码的。 But it does not work when I change wchar_t
to char16_t
at the first line in main()
.但是当我在main()
的第一行将wchar_t
更改为char16_t
时它不起作用。 I've tried both gcc5.4 and clang8.0 with -std=c++11
.我已经用-std=c++11
尝试了 gcc5.4 和 clang8.0 。 My goal is to replace wchar_t
with char16_t
, as wchar_t
takes twice space in RAM.我的目标是用char16_t
替换wchar_t
,因为wchar_t
在 RAM 中占用两倍的空间。 I thought these 2 types are equally well supported in c++11 and later standards.我认为这两种类型在 c++11 和更高版本的标准中同样得到很好的支持。 What do I miss here?我在这里想念什么?
#include<iostream>
#include<fstream>
#include<locale>
#include<codecvt>
#include<string>
int main(){
typedef wchar_t my_char;
std::locale::global(std::locale("en_US.UTF-8"));
std::ofstream out("file.txt");
out << "123正则表达式abc" << std::endl;
out.close();
std::basic_ifstream<my_char> win("file.txt");
std::basic_string<my_char> wstr;
win >> wstr;
win.close();
std::ifstream in("file.txt");
std::string str;
in >> str;
in.close();
std::wstring_convert<std::codecvt_utf8<my_char>, my_char> my_char_conv;
std::basic_string<my_char> conv = my_char_conv.from_bytes(str);
std::cout << (wstr == conv ? "true" : "false") << std::endl;
std::basic_ofstream<my_char> wout("out.txt");
wout << wstr << std::endl << conv << std::endl;
wout.close();
return 0;
}
The modified code does not compile with clang8.0.修改后的代码不能用 clang8.0 编译。 It compiles with gcc5.4 but crashes at run-time as shown by @Brian.它使用 gcc5.4 编译,但在运行时崩溃,如@Brian 所示。
The various stream classes need a set of definitions to be operational.各种流类需要一组定义才能操作。 The standard library requires the relevant definitions and objects only for char
and wchar_t
but not for char16_t
or char32_t
.标准库只需要char
和wchar_t
的相关定义和对象,而不需要char16_t
或char32_t
。 Off the top of my head the following is needed to use std::basic_ifstream<cT>
or std::basic_ofstream<cT>
:在我的脑海中,使用std::basic_ifstream<cT>
或std::basic_ofstream<cT>
:
std::char_traits<cT>
to specify how the character type behaves. std::char_traits<cT>
指定字符类型的行为方式。 I think this template is specialized for char16_t
and char32_t
.我认为这个模板专门用于char16_t
和char32_t
。std::locale
needs to contain an instance of the std::num_put<cT>
facet to format numeric types.使用的std::locale
需要包含std::num_put<cT>
facet 的实例以格式化数字类型。 This facet can just be instantiated and a new std::locale
containing it can be created but the standard doesn't mandate that it is present in a std::locale
object.这个方面可以被实例化,并且可以创建一个包含它的新std::locale
,但标准并不强制要求它存在于std::locale
对象中。std::locale
needs to contain an instance of the facet std::num_get<cT>
to read numeric types.使用的std::locale
需要包含方面std::num_get<cT>
的实例以读取数字类型。 Again, this facet can be instantiated but isn't required to be present by default.同样,这个方面可以被实例化,但默认情况下不需要存在。std::numpunct<cT>
needs to be specialized and put into the used std::locale
to deal with decimal points, thousand separators, and textual boolean values.方面std::numpunct<cT>
需要专门化并放入使用过的std::locale
以处理小数点、千位分隔符和文本布尔值。 Even if it isn't really used it will be referenced from the numeric formatting and parsing functions.即使它没有被真正使用,它也会从数字格式和解析函数中引用。 There is no ready specialization for char16_t
or char32_t
. char16_t
或char32_t
没有现成的专业化。std::ctype<cT>
needs to be specialized and put into the used facet to support widening, narrowing, and classification of the character type. facet std::ctype<cT>
需要特化并放入used facet,以支持字符类型的加宽、缩小和分类。 There is no ready specialization for char16_t
or char32_t
. char16_t
或char32_t
没有现成的专业化。
std::codecvt<cT, char, std::mbstate_t>
needs to be specialized and put into the used std::locale
to convert between external byte sequences and internal "character" sequences.方面std::codecvt<cT, char, std::mbstate_t>
需要专门化并放入使用的std::locale
以在外部字节序列和内部“字符”序列之间进行转换。 There is no ready specialization for char16_t
or char32_t
. char16_t
或char32_t
没有现成的专业化。 Most of the facets are reasonably easy to do: they just need to forward a simple conversion or do table look-ups.大多数方面都相当容易做到:它们只需要转发一个简单的转换或进行表查找。 However, the std::codecvt
facet tends to be rather tricky, especially because std::mbstate_t
is an opaque type from the point of view of the standard C++ library.然而, std::codecvt
方面往往相当棘手,特别是因为从标准 C++ 库的角度来看, std::mbstate_t
是一种不透明的类型。
All of that can be done.所有这些都可以做到。 It is a while since I last did a proof of concept implementation for a character type.自从我上次对字符类型进行概念实现证明已经有一段时间了。 It took me about a day worth of work.我花了大约一天的时间工作。 Of course, I knew what I need to do when I embarked on the work having implemented the locales and IOStreams library before.当然,当我开始工作时,我已经实现了语言环境和 IOStreams 库,我知道我需要做什么。 To add a reasonable amount of tests rather than merely having a simple demo would probably take me a week or so (assuming I can actually concentrate on this work).添加合理数量的测试而不是仅仅进行简单的演示可能需要我一周左右的时间(假设我实际上可以专注于这项工作)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.