简体   繁体   English

C++ 将字符串(或 char*)转换为 wstring(或 wchar_t*)

[英]C++ Convert string (or char*) to wstring (or wchar_t*)

string s = "おはよう";
wstring ws = FUNCTION(s, ws);

Assuming that the input string in your example (おはよう) is a UTF-8 encoded (which it isn't, by the looks of it, but let's assume it is for the sake of this explanation :-)) representation of a Unicode string of your interest, then your problem can be fully solved with the standard library (C++11 and newer) alone.假设您的示例中的输入字符串 (おはよう) 是 UTF-8 编码的(从外观上看它不是,但我们假设它是为了便于解释:-))Unicode 字符串的表示如果您感兴趣,那么您的问题可以单独使用标准库(C++11 和更新版本)完全解决。

The TL;DR version: TL;DR 版本:

#include <locale>
#include <codecvt>
#include <string>

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::string narrow = converter.to_bytes(wide_utf16_source_string);
std::wstring wide = converter.from_bytes(narrow_utf8_source_string);

Longer online compilable and runnable example:更长的在线可编译和可运行示例:

(They all show the same example. There are just many for redundancy...) (他们都展示了相同的例子。冗余的只是很多......)

Note (old) :注意(旧)

As pointed out in the comments and explained in https://stackoverflow.com/a/17106065/6345 there are cases when using the standard library to convert between UTF-8 and UTF-16 might give unexpected differences in the results on different platforms.正如评论中指出并在https://stackoverflow.com/a/17106065/6345中解释的那样,在某些情况下,使用标准库在 UTF-8 和 UTF-16 之间进行转换可能会在不同平台上产生意想不到的结果差异. For a better conversion, consider std::codecvt_utf8 as described on http://en.cppreference.com/w/cpp/locale/codecvt_utf8为了更好的转换,请考虑http://en.cppreference.com/w/cpp/locale/codecvt_utf8上描述的std::codecvt_utf8

Note (new) :注意(新)

Since the codecvt header is deprecated in C++17, some worry about the solution presented in this answer were raised.由于codecvt标头在 C++17 中已弃用,因此有人担心此答案中提出的解决方案。 However, the C++ standards committee added an important statement in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html saying但是,C++ 标准委员会在http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html中添加了一条重要声明说

this library component should be retired to Annex D, along side , until a suitable replacement is standardized.该库组件应与附件 D 一起退出,直到合适的替代品标准化。

So in the foreseeable future, the codecvt solution in this answer is safe and portable.所以在可预见的未来,这个答案中的codecvt解决方案是安全且可移植的。

int StringToWString(std::wstring &ws, const std::string &s)
{
    std::wstring wsTmp(s.begin(), s.end());

    ws = wsTmp;

    return 0;
}

Your question is underspecified.您的问题未详细说明。 Strictly, that example is a syntax error.严格来说,该示例是语法错误。 However,mbstowcs<\/code><\/a> is probably what you're looking for.但是,mbstowcs<\/code><\/a>可能是您正在寻找的。

It is a C-library function and operates on buffers, but here's an easy-to-use idiom, courtesy of Mooing Duck:它是一个 C 库函数并在缓冲区上运行,但这里有一个易于使用的习惯用法,由 Mooing Duck 提供:

std::wstring ws(s.size(), L' '); // Overestimate number of code points.
ws.resize(::mbstowcs_s(&ws[0], ws.size(), s.c_str(), s.size())); // Shrink to fit.

If you are using Windows<\/em> \/ Visual Studio<\/em> and need to convert a string to wstring you could use:如果您使用的是Windows<\/em> \/ Visual Studio<\/em>并且需要将字符串转换为 wstring,您可以使用:

#include <AtlBase.h>
#include <atlconv.h>
...
string s = "some string";
CA2W ca2w(s.c_str());
wstring w = ca2w;
printf("%s = %ls", s.c_str(), w.c_str());

Windows API only, pre C++11 implementation, in case someone needs it:仅限 Windows API,C++11 之前的实现,以防有人需要:

#include <stdexcept>
#include <vector>
#include <windows.h>

using std::runtime_error;
using std::string;
using std::vector;
using std::wstring;

wstring utf8toUtf16(const string & str)
{
   if (str.empty())
      return wstring();

   size_t charsNeeded = ::MultiByteToWideChar(CP_UTF8, 0, 
      str.data(), (int)str.size(), NULL, 0);
   if (charsNeeded == 0)
      throw runtime_error("Failed converting UTF-8 string to UTF-16");

   vector<wchar_t> buffer(charsNeeded);
   int charsConverted = ::MultiByteToWideChar(CP_UTF8, 0, 
      str.data(), (int)str.size(), &buffer[0], buffer.size());
   if (charsConverted == 0)
      throw runtime_error("Failed converting UTF-8 string to UTF-16");

   return wstring(&buffer[0], charsConverted);
}

Here's a way to combining string , wstring and mixed string constants to wstring .这是一种将stringwstring和混合字符串常量组合到wstring的方法。 Use the wstringstream class.使用wstringstream类。

This does NOT work for multi-byte character encodings.这不适用于多字节字符编码。 This is just a dumb way of throwing away type safety and expanding 7 bit characters from std::string into the lower 7 bits of each character of std:wstring.这只是丢弃类型安全并将 7 位字符从 std::string 扩展为 std:wstring 每个字符的低 7 位的愚蠢方式。 This is only useful if you have a 7-bit ASCII strings and you need to call an API that requires wide strings.这仅在您有 7 位 ASCII 字符串并且需要调用需要宽字符串的 API 时才有用。

#include <sstream>

std::string narrow = "narrow";
std::wstring wide = L"wide";

std::wstringstream cls;
cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
std::wstring total= cls.str();

From char* to wstring :char*wstring

char* str = "hello worlddd";
wstring wstr (str, str+strlen(str));

From string to wstring :stringwstring

string str = "hello worlddd";
wstring wstr (str.begin(), str.end());

Note this only works well if the string being converted contains only ASCII characters.请注意,这仅在被转换的字符串仅包含 ASCII 字符时才有效。

使用 Boost.Locale:

ws = boost::locale::conv::utf_to_utf<wchar_t>(s);

This variant of it is my favourite in real life.它的这种变体是我在现实生活中的最爱。 It converts the input, if it is valid<\/strong> UTF-8, to the respective wstring<\/code> .它将输入(如果它是有效<\/strong>的 UTF-8)转换为相应的wstring<\/code> 。 If the input is corrupted, the wstring<\/code> is constructed out of the single bytes.如果输入损坏,则wstring<\/code>由单个字节构成。 This is extremely helpful if you cannot really be sure about the quality of your input data.如果您不能真正确定输入数据的质量,这将非常有用。

std::wstring convert(const std::string& input)
{
    try
    {
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
        return converter.from_bytes(input);
    }
    catch(std::range_error& e)
    {
        size_t length = input.length();
        std::wstring result;
        result.reserve(length);
        for(size_t i = 0; i < length; i++)
        {
            result.push_back(input[i] & 0xFF);
        }
        return result;
    }
}

You can use boost path or std path;您可以使用 boost 路径或 std 路径; which is a lot more easier.这要容易得多。 boost path is easier for cross-platform application跨平台应用程序更容易提升路径

#include <boost/filesystem/path.hpp>

namespace fs = boost::filesystem;

//s to w
std::string s = "xxx";
auto w = fs::path(s).wstring();

//w to s
std::wstring w = L"xxx";
auto s = fs::path(w).string();

if you like to use std:如果你喜欢使用标准:

#include <filesystem>
namespace fs = std::filesystem;

//The same

c++ older version c++ 旧版本

#include <experimental/filesystem>
namespace fs = std::experimental::filesystem;

//The same

The code within still implement a converter which you dont have to unravel the detail.其中的代码仍然实现了一个转换器,您不必解开细节。

String to wstring字符串到 wstring

std::wstring Str2Wstr(const std::string& str)
{
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo(size_needed, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

wstring to String wstring 到字符串

std::string Wstr2Str(const std::wstring& wstr)
{
    typedef std::codecvt_utf8<wchar_t> convert_typeX;
    std::wstring_convert<convert_typeX, wchar_t> converterX;
    return converterX.to_bytes(wstr);
}

If you have QT and if you are lazy to implement a function and stuff you can use如果你有 QT 并且如果你懒得实现一个你可以使用的函数和东西

std::string str;
QString(str).toStdWString()

std::string -> wchar_t[]<\/code> with safe mbstowcs_s<\/code> function:带有安全mbstowcs_s<\/code>函数的std::string -> wchar_t[]<\/code> :

auto ws = std::make_unique<wchar_t[]>(s.size() + 1);
mbstowcs_s(nullptr, ws.get(), s.size() + 1, s.c_str(), s.size());

method s2ws works well.方法 s2ws 效果很好。 Hope helps.希望有所帮助。

std::wstring s2ws(const std::string& s) {
    std::string curLocale = setlocale(LC_ALL, ""); 
    const char* _Source = s.c_str();
    size_t _Dsize = mbstowcs(NULL, _Source, 0) + 1;
    wchar_t *_Dest = new wchar_t[_Dsize];
    wmemset(_Dest, 0, _Dsize);
    mbstowcs(_Dest,_Source,_Dsize);
    std::wstring result = _Dest;
    delete []_Dest;
    setlocale(LC_ALL, curLocale.c_str());
    return result;
}

Based upon my own testing (On windows 8, vs2010) mbstowcs can actually damage original string, it works only with ANSI code page.根据我自己的测试(在 Windows 8,vs2010 上)mbstowcs 实际上会损坏原始字符串,它仅适用于 ANSI 代码页。 If MultiByteToWideChar/WideCharToMultiByte can also cause string corruption - but they tends to replace characters which they don't know with '?'如果 MultiByteToWideChar/WideCharToMultiByte 也可能导致字符串损坏 - 但他们倾向于用“?”替换他们不知道的字符question marks, but mbstowcs tends to stop when it encounters unknown character and cut string at that very point.问号,但 mbstowcs 往往会在遇到未知字符时停止,并在此时剪切字符串。 (I have tested Vietnamese characters on finnish windows). (我已经在芬兰窗口上测试过越南语字符)。

So prefer Multi*-windows api function over analogue ansi C functions.因此,比起模拟 ansi C 函数,更喜欢 Multi*-windows api 函数。

Also what I've noticed shortest way to encode string from one codepage to another is not use MultiByteToWideChar/WideCharToMultiByte api function calls but their analogue ATL macros: W2A / A2W.另外,我注意到将字符串从一个代码页编码到另一个代码页的最短方法不是使用 MultiByteToWideChar/WideCharToMultiByte api 函数调用,而是使用它们的模拟 ATL 宏:W2A / A2W。

So analogue function as mentioned above would sounds like:所以上面提到的模拟函数听起来像:

wstring utf8toUtf16(const string & str)
{
   USES_CONVERSION;
   _acp = CP_UTF8;
   return A2W( str.c_str() );
}

_acp is declared in USES_CONVERSION macro. _acp 在USES_CONVERSION 宏中声明。

Or also function which I often miss when performing old data conversion to new one:或者还有我在执行旧数据转换为新数据时经常错过的功能:

string ansi2utf8( const string& s )
{
   USES_CONVERSION;
   _acp = CP_ACP;
   wchar_t* pw = A2W( s.c_str() );

   _acp = CP_UTF8;
   return W2A( pw );
}

But please notice that those macro's use heavily stack - don't use for loops or recursive loops for same function - after using W2A or A2W macro - better to return ASAP, so stack will be freed from temporary conversion.但请注意,这些宏大量使用堆栈 - 不要对相同的函数使用 for 循环或递归循环 - 在使用 W2A 或 A2W 宏之后 - 最好尽快返回,因此堆栈将从临时转换中释放出来。

For me the most uncomplicated option without big overhead is:对我来说,没有大开销的最简单的选择是:

Include:包括:

#include <atlbase.h>
#include <atlconv.h>

Convert:兑换:

char* whatever = "test1234";
std::wstring lwhatever = std::wstring(CA2W(std::string(whatever).c_str()));

If needed:如果需要的话:

lwhatever.c_str();

use this code to convert your string to wstring使用此代码将您的字符串转换为 wstring

std::wstring string2wString(const std::string& s){
    int len;
    int slength = (int)s.length() + 1;
    len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0); 
    wchar_t* buf = new wchar_t[len];
    MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
    std::wstring r(buf);
    delete[] buf;
    return r;
}

int main(){
    std::wstring str="your string";
    std::wstring wStr=string2wString(str);
    return 0;
}

string s = "おはよう"; is an error.是一个错误。

You should use wstring directly:您应该直接使用 wstring:

wstring ws = L"おはよう";

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM