为什么`std::basic_ifstream<char16_t> ` 在 C++11 中不起作用？

Question

The following code works as expected.以下代码按预期工作。 The source code, file "file.txt" and "out.txt" are all encoded with utf8.源代码，文件“file.txt”和“out.txt”都是用utf8编码的。 But it does not work when I change wchar_t to char16_t at the first line in main() .但是当我在main()的第一行将wchar_t更改为char16_t时它不起作用。 I've tried both gcc5.4 and clang8.0 with -std=c++11 .我已经用-std=c++11尝试了 gcc5.4 和 clang8.0 。 My goal is to replace wchar_t with char16_t , as wchar_t takes twice space in RAM.我的目标是用char16_t替换wchar_t ，因为wchar_t在 RAM 中占用两倍的空间。 I thought these 2 types are equally well supported in c++11 and later standards.我认为这两种类型在 c++11 和更高版本的标准中同样得到很好的支持。 What do I miss here?我在这里想念什么？

#include<iostream>
#include<fstream>
#include<locale>
#include<codecvt>
#include<string>

int main(){
  typedef wchar_t my_char;

  std::locale::global(std::locale("en_US.UTF-8"));

  std::ofstream out("file.txt");
  out << "123正则表达式abc" << std::endl;
  out.close();

  std::basic_ifstream<my_char> win("file.txt");
  std::basic_string<my_char> wstr;
  win >> wstr;
  win.close();

  std::ifstream in("file.txt");
  std::string str;
  in >> str;
  in.close();

  std::wstring_convert<std::codecvt_utf8<my_char>, my_char> my_char_conv;
  std::basic_string<my_char> conv = my_char_conv.from_bytes(str);

  std::cout << (wstr == conv ? "true" : "false") << std::endl;

  std::basic_ofstream<my_char> wout("out.txt");
  wout << wstr << std::endl << conv << std::endl;
  wout.close();

  return 0;
}

EDIT编辑

The modified code does not compile with clang8.0.修改后的代码不能用 clang8.0 编译。 It compiles with gcc5.4 but crashes at run-time as shown by @Brian.它使用 gcc5.4 编译，但在运行时崩溃，如@Brian 所示。

Answer 1

The various stream classes need a set of definitions to be operational.各种流类需要一组定义才能操作。 The standard library requires the relevant definitions and objects only for char and wchar_t but not for char16_t or char32_t .标准库只需要char和wchar_t的相关定义和对象，而不需要char16_t或char32_t 。 Off the top of my head the following is needed to use std::basic_ifstream<cT> or std::basic_ofstream<cT> :在我的脑海中，使用std::basic_ifstream<cT>或std::basic_ofstream<cT> ：

std::char_traits<cT> to specify how the character type behaves. std::char_traits<cT>指定字符类型的行为方式。 I think this template is specialized for char16_t and char32_t .我认为这个模板专门用于char16_t和char32_t 。
The used std::locale needs to contain an instance of the std::num_put<cT> facet to format numeric types.使用的std::locale需要包含std::num_put<cT> facet 的实例以格式化数字类型。 This facet can just be instantiated and a new std::locale containing it can be created but the standard doesn't mandate that it is present in a std::locale object.这个方面可以被实例化，并且可以创建一个包含它的新std::locale ，但标准并不强制要求它存在于std::locale对象中。
The used std::locale needs to contain an instance of the facet std::num_get<cT> to read numeric types.使用的std::locale需要包含方面std::num_get<cT>的实例以读取数字类型。 Again, this facet can be instantiated but isn't required to be present by default.同样，这个方面可以被实例化，但默认情况下不需要存在。
the facet std::numpunct<cT> needs to be specialized and put into the used std::locale to deal with decimal points, thousand separators, and textual boolean values.方面std::numpunct<cT>需要专门化并放入使用过的std::locale以处理小数点、千位分隔符和文本布尔值。 Even if it isn't really used it will be referenced from the numeric formatting and parsing functions.即使它没有被真正使用，它也会从数字格式和解析函数中引用。 There is no ready specialization for char16_t or char32_t . char16_t或char32_t没有现成的专业化。
The facet std::ctype<cT> needs to be specialized and put into the used facet to support widening, narrowing, and classification of the character type. facet std::ctype<cT>需要特化并放入used facet，以支持字符类型的加宽、缩小和分类。 There is no ready specialization for char16_t or char32_t . char16_t或char32_t没有现成的专业化。
1. The facet std::codecvt<cT, char, std::mbstate_t> needs to be specialized and put into the used std::locale to convert between external byte sequences and internal "character" sequences.方面std::codecvt<cT, char, std::mbstate_t>需要专门化并放入使用的std::locale以在外部字节序列和内部“字符”序列之间进行转换。 There is no ready specialization for char16_t or char32_t . char16_t或char32_t没有现成的专业化。

Most of the facets are reasonably easy to do: they just need to forward a simple conversion or do table look-ups.大多数方面都相当容易做到：它们只需要转发一个简单的转换或进行表查找。 However, the std::codecvt facet tends to be rather tricky, especially because std::mbstate_t is an opaque type from the point of view of the standard C++ library.然而， std::codecvt方面往往相当棘手，特别是因为从标准 C++ 库的角度来看， std::mbstate_t是一种不透明的类型。

All of that can be done.所有这些都可以做到。 It is a while since I last did a proof of concept implementation for a character type.自从我上次对字符类型进行概念实现证明已经有一段时间了。 It took me about a day worth of work.我花了大约一天的时间工作。 Of course, I knew what I need to do when I embarked on the work having implemented the locales and IOStreams library before.当然，当我开始工作时，我已经实现了语言环境和 IOStreams 库，我知道我需要做什么。 To add a reasonable amount of tests rather than merely having a simple demo would probably take me a week or so (assuming I can actually concentrate on this work).添加合理数量的测试而不是仅仅进行简单的演示可能需要我一周左右的时间（假设我实际上可以专注于这项工作）。

为什么`std::basic_ifstream<char16_t> ` 在 C++11 中不起作用？

问题描述

EDIT编辑

1 个解决方案

解决方案1
6 已采纳 2016-12-24 21:02:45

为什么`std::basic_ifstream<char16_t> ` 在 C++11 中不起作用？

问题描述

EDIT编辑

1 个解决方案

解决方案1 6 已采纳 2016-12-24 21:02:45

解决方案1
6 已采纳 2016-12-24 21:02:45