Usage of istreambuf_iterator<TCHAR>

Question

I have a piece of code that reads the content of a .txt-file into a string.

std::ifstream file("address.txt");  
std::string oldAddress((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());

Naturally it also works if I use std::wstring instead, like this:

std::wifstream file("address.txt"); 
std::string oldAddress((std::istreambuf_iterator<wchar_t>(file)), std::istreambuf_iterator<wchar_t>());

Here is my question: Let's say I don't know if Character Set is Unicode or Multi-Byte and I want my code to be general enough to handle both options. Which is the best way to use the istreambuf_iterator in order to get the string concept based on TCHAR ?

This is my attempt, which works, but I wonder if it really is neccessary to create these typedefs.

typedef std::basic_ifstream<TCHAR> tifstream;
typedef std::basic_string<TCHAR, std::char_traits<TCHAR>, std::allocator<TCHAR>> tstring;

tifstream file("address.txt");  
tstring oldAddress((std::istreambuf_iterator<TCHAR>(file)), std::istreambuf_iterator<TCHAR>());

Thanks in advance!

Answer 1

If you want to handle a new character Type which is not supported out of the box by your library (in this case it is the MSVCRT), apart from the regular typedefs to relate with your character Type, you should also provide a char_traits.

char_traits for your character type is important in absence of which compare, length and other routines statically specialized for the character would not work and you would face undesired behaviour.

Ensure that you specialize your char_traits template for TCHAR

template<>
struct char_traits<TCHAR>
{

Answer 2

Actually, I wouldn't bother with TCHAR unless interfacing with the win32 API. In that case then, I'd also just use the wchar_t interfaces of the win32 API and prefer using wchar_t in general when handling text internally, in order to be able to support multiple scripts at the same time. Also, wchar_t equals WCHAR which is the internal character type of MS Windows using a UTF-16 encoding. Note though that using UTF-16 internally also has its problems, because even there a letter can still use multiple Unicode codepoints and a sigle codepoint can still use multiple wchar_t elements, which makes substring operations difficult.

Concerning file input, which you didn't ask about but still mentioned in your example code, this is a different thing. Firstly, using wchar_t (aka WCHAR ) with UTF-16 encoding as internal representation for text allows you to read text files with any encoding. However, when reading a file, you need to know the file's encoding, which then allows you to decode it accordingly. Using different internal representations for different file encodings would be the wrong approach.

Usage of istreambuf_iterator<TCHAR>

Question

2 answers

solution1
3 ACCPTED 2015-04-26 15:18:47

solution2
0 2015-04-26 15:41:29

Usage of istreambuf_iterator<TCHAR>

Question

2 answers

solution1 3 ACCPTED 2015-04-26 15:18:47

solution2 0 2015-04-26 15:41:29

solution1
3 ACCPTED 2015-04-26 15:18:47

solution2
0 2015-04-26 15:41:29