简体   繁体   中英

c++ convert from UTF-8 to wstring using iconv

I have a c++ linux application which runs the following:

int main()
{
  using namespace std;
  char str[] = "¡Hola!";

  wchar_t wstr[50];

  size_t rc;

  memset(wstr, 0, sizeof(wstr));

  rc = mbstowcs(wstr, str, 50);

  cout << "mbstowcs results: ";
  cout << "rc = " << rc << endl;
  cout << "str:" << str  << endl;
  wcout << L"wstr:" << wstr  << endl;
  setlocale(LC_CTYPE,"");
  iconv_t cd = iconv_open("WCHAR_T", "UTF-8");
  cout << "iconv_open errno = "<< errno << endl;

  char *s = str;
  char *t = (char *)wstr;
  size_t s1 = strlen(str);
  size_t s2 = 50;

  rc = iconv(cd, &s, &s1, &t, &s2);

  cout << "iconv results: ";
  cout << "rc = " << rc << endl;
  cout << "str:" << str  << endl;
  wcout << L"wstr:" << wstr  << endl;

}

I want to convert a UTF-8 char vector to wstring, but the above code return this result:

 mbstowcs results: rc = 18446744073709551615
    str:¡Hola!
    wstr:
    iconv_open errno = 2
    iconv results: rc = 0
    str:¡Hola!
    wstr:�Hola!

iconv result convert the first char to another char.

Note: if I replace the WCHAR_T in UCS-4 -INTERNAL the wstr contains nothing.

any help?

thanks!

Without looking at the iconv documentation (never had to use it so far) I'd expect your input ( char str[] = "¡Hola!"; ) not being encoded as a multibyte string - it's more likely a simple ANSI string using your local/current codepage to represent the '¡' . Or in other words: In your existing string (using const char[] ) '¡' is stored in a single byte with a value somewhere above 127. mbstowcs() however would expect it to use possibly 2 bytes to represent a proper '¡' (didn't check this for now) and the value your '¡' uses might even be something not expected/allowed.

I'd expect the error to happen there as mbcstowcs() should return the number of characters in the converted string - but "18446744073709551615" is simply too long. If this is true, you should also be able to use iconv properly when defining your own wide string with the proper text and using that one instead ( wchar_t wstr[] = L"¡Hola!"; ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM