简体   繁体   中英

Read entire UTF-8 file into std::string

I used the following on ASCII file:

#include <fstream>
#include <streambuf>
#include <string>
#include <cerrno>

std::string get_file_contents(const char *filename)
{
  std::ifstream in(filename, std::ios::in | std::ios::binary);
  if (in)
  {
    return(std::string((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>()));
  }
  throw(errno);
}

I want to confirm if it will work for a UTF-8 file as well into std::string or are there any special settings?

It's fine to read all UTF-8 characters like this; it's just a sequence of bytes after all and only when you further process, convert or output text then you'll need to ensure that the encoding is taken into account.

One potential pitfall is the BOM ( https://en.wikipedia.org/wiki/Byte_order_mark ). If your text file has a BOM then you may want to manually remove it from the string or handle it appropriately. There shouldn't be any need to use the BOM with UTF-8 but some software does it anyway to distinguish types of encoding, presumably. Notepad on Windows saves a BOM, for example (have Notepad save the file with UTF-8 encoding and open the file in the binary editor to check it out).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM