简体   繁体   English

istreambuf_iterator的用法<TCHAR>

[英]Usage of istreambuf_iterator<TCHAR>

I have a piece of code that reads the content of a .txt-file into a string. 我有一段代码将.txt文件的内容读入字符串。

std::ifstream file("address.txt");  
std::string oldAddress((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());

Naturally it also works if I use std::wstring instead, like this: 当然,如果我改用std::wstring ,它也可以正常工作,如下所示:

std::wifstream file("address.txt"); 
std::string oldAddress((std::istreambuf_iterator<wchar_t>(file)), std::istreambuf_iterator<wchar_t>());

Here is my question: Let's say I don't know if Character Set is Unicode or Multi-Byte and I want my code to be general enough to handle both options. 这是我的问题:假设我不知道字符集是Unicode还是多字节,我希望我的代码足够通用以处理这两个选项。 Which is the best way to use the istreambuf_iterator in order to get the string concept based on TCHAR ? 使用istreambuf_iterator以获得基于TCHAR的字符串概念的最佳方法是哪一种?

This is my attempt, which works, but I wonder if it really is neccessary to create these typedefs. 这是我的尝试,可以成功,但是我想知道是否真的有必要创建这些typedef。

typedef std::basic_ifstream<TCHAR> tifstream;
typedef std::basic_string<TCHAR, std::char_traits<TCHAR>, std::allocator<TCHAR>> tstring;

tifstream file("address.txt");  
tstring oldAddress((std::istreambuf_iterator<TCHAR>(file)), std::istreambuf_iterator<TCHAR>());

Thanks in advance! 提前致谢!

If you want to handle a new character Type which is not supported out of the box by your library (in this case it is the MSVCRT), apart from the regular typedefs to relate with your character Type, you should also provide a char_traits. 如果要处理库不支持的新字符类型(本例中为MSVCRT),除了要与您的字符类型相关的常规typedef外,还应该提供char_traits。

char_traits for your character type is important in absence of which compare, length and other routines statically specialized for the character would not work and you would face undesired behaviour. 对于您的字符类型,char_traits很重要,因为缺少它们,比较,长度和静态专门用于该字符的其他例程将无法正常工作,并且您将面临不良行为。

Ensure that you specialize your char_traits template for TCHAR 确保您专用于TCHAR的char_traits模板

template<>
struct char_traits<TCHAR>
{   

Actually, I wouldn't bother with TCHAR unless interfacing with the win32 API. 实际上,除非与win32 API接口,否则我不会理会TCHAR In that case then, I'd also just use the wchar_t interfaces of the win32 API and prefer using wchar_t in general when handling text internally, in order to be able to support multiple scripts at the same time. 那么在那种情况下,我也将只使用win32 API的wchar_t接口,并且在内部处理文本时通常更喜欢使用wchar_t ,以便能够同时支持多个脚本。 Also, wchar_t equals WCHAR which is the internal character type of MS Windows using a UTF-16 encoding. 另外,wchar_t等于WCHAR ,后者是使用UTF-16编码的MS Windows的内部字符类型。 Note though that using UTF-16 internally also has its problems, because even there a letter can still use multiple Unicode codepoints and a sigle codepoint can still use multiple wchar_t elements, which makes substring operations difficult. 请注意,尽管在内部使用UTF-16也有其问题,因为即使在一个字母处仍可以使用多个Unicode代码点,而单个代码点仍可以使用多个wchar_t元素,这使子字符串操作变得困难。

Concerning file input, which you didn't ask about but still mentioned in your example code, this is a different thing. 关于文件输入(您没有询问但在示例代码中仍然提到),这是另一回事。 Firstly, using wchar_t (aka WCHAR ) with UTF-16 encoding as internal representation for text allows you to read text files with any encoding. 首先,使用带有UTF-16编码的wchar_t (又名WCHAR )作为文本的内部表示,可以读取具有任何编码的文本文件。 However, when reading a file, you need to know the file's encoding, which then allows you to decode it accordingly. 但是,在读取文件时,您需要知道文件的编码,然后才可以对其进行解码。 Using different internal representations for different file encodings would be the wrong approach. 对不同的文件编码使用不同的内部表示形式将是错误的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM