简体   繁体   中英

How do I convert wchar_t* to std::string?

I changed my class to use std::string (based on the answer I got here but a function I have returns wchar_t *. How do I convert it to std::string?

I tried this:

std::string test = args.OptionArg();

but it says error C2440: 'initializing' : cannot convert from 'wchar_t *' to 'std::basic_string<_Elem,_Traits,_Ax>'

std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );

You can convert a wide char string to an ASCII string using the following function:

#include <locale>
#include <sstream>
#include <string>

std::string ToNarrow( const wchar_t *s, char dfault = '?', 
                      const std::locale& loc = std::locale() )
{
  std::ostringstream stm;

  while( *s != L'\0' ) {
    stm << std::use_facet< std::ctype<wchar_t> >( loc ).narrow( *s++, dfault );
  }
  return stm.str();
}

Be aware that this will just replace any wide character for which an equivalent ASCII character doesn't exist with the dfault parameter; it doesn't convert from UTF-16 to UTF-8. If you want to convert to UTF-8 use a library such as ICU .

This is an old question, but if it's the case you're not really seeking conversions but rather using the TCHAR stuff from Mircosoft to be able to build both ASCII and Unicode, you could recall that std::string is really

typedef std::basic_string<char> string

So we could define our own typedef, say

#include <string>
namespace magic {
typedef std::basic_string<TCHAR> string;
}

Then you could use magic::string with TCHAR , LPCTSTR , and so forth

您可以只使用wstring并将所有内容保留在 Unicode 中

just for fun :-):

const wchar_t* val = L"hello mfc";
std::string test((LPCTSTR)CString(val));

Following code is more concise:

wchar_t wstr[500];
char string[500];
sprintf(string,"%ls",wstr);

It's rather disappointing that none of the answers given to this old question addresses the problem of converting wide strings into UTF-8 strings, which is important in non-English environments.

Here's an example code that works and may be used as a hint to construct custom converters. It is based on an example code from Example code in cppreference.com .

#include <iostream>
#include <clocale>
#include <string>
#include <cstdlib>
#include <array>

std::string convert(const std::wstring& wstr)
{
    const int BUFF_SIZE = 7;
    if (MB_CUR_MAX >= BUFF_SIZE) throw std::invalid_argument("BUFF_SIZE too small");
    std::string result;
    bool shifts = std::wctomb(nullptr, 0);  // reset the conversion state
    for (const wchar_t wc : wstr)
    {
        std::array<char, BUFF_SIZE> buffer;
        const int ret = std::wctomb(buffer.data(), wc);
        if (ret < 0) throw std::invalid_argument("inconvertible wide characters in the current locale");
        buffer[ret] = '\0';  // make 'buffer' contain a C-style string
        result = result + std::string(buffer.data());
    }
    return result;
}

int main()
{
    auto loc = std::setlocale(LC_ALL, "en_US.utf8");  // UTF-8
    if (loc == nullptr) throw std::logic_error("failed to set locale");
    std::wstring wstr = L"aąß水𝄋-扫描-€𐍈\u00df\u6c34\U0001d10b";
    std::cout << convert(wstr) << "\n";
}

This prints, as expected:

程序打印

Explanation

  • 7 seems to be the minimal secure value of the buffer size, BUFF_SIZE . This includes 4 as the maximum number of UTF-8 bytes encoding a single character; 2 for the possible "shift sequence", 1 for the trailing '\\0' .
  • MB_CUR_MAX is a run-time variable , so static_assert is not usable here
  • Each wide character is translated into its char representation using std::wctomb
  • This conversion makes sense only if the current locale allows multi-byte representations of a character
  • For this to work, the application needs to set the proper locale. en_US.utf8 seems to be sufficiently universal (available on most machines). In Linux, available locales can be queried in the console via locale -a command.

Critique of the most upvoted answer

The most upvoted answer,

std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );

works well only when the wide characters represent ASCII characters - but these are not what wide characters were designed for. In this solution, the converted string contains one char per each source wide char, ws.size() == test.size() . Thus, it loses information from the original wstring and produces strings that cannot be interpreted as proper UTF-8 sequences. For example, on my machine the string resulting from this simplistic conversion of "ĄŚĆII" prints as "ZII", even though its size is 5 (and should be 8).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM