简体   繁体   中英

Unicode characters become question marks (C++ and MFC Dialog Based App)

These codes sets the values to editbox. But I'm having trouble when I retrieve Unicode characters from MySQL table.

For example, string nüşabə sets as nüşabÉ™ .

Here is my codes.

void CmysqlDlg::OnBnClickedButton3()
{
    USES_CONVERSION;

    try
    {
        mysql::MySQL_Driver *driver = new mysql::MySQL_Driver;
        Connection *dbConn;
        Statement *st;
        ResultSet *res;

        driver = mysql::get_mysql_driver_instance();
        dbConn = driver->connect("tcp://127.0.0.1:3306", "root", "connection");
        dbConn->setSchema("mfc_app_database");

        st = dbConn->createStatement();
        res = st->executeQuery("SELECT password FROM users WHERE id=1");
        string z;
        while (res->next())
        {
            //k = res->getString("username");
            //cs.Format(_T("%s"), k);
            //CString cs(k.c_str(), CP_UTF8);
            //combo.AddString(cs);
            //usernameData.SetWindowTextW(cs);

            z = res->getString("password");
            CString pass(z.c_str()/*, CP_UTF8*/);
            nameData.SetWindowTextW(pass);
        }


        delete res;
        delete st;
        delete dbConn;
        delete driver;
    }
    catch (exception e)
    {
        ofstream file("sadaasad.txt");
        file << e.what();
        file.close();
    }
}

Database collation is set to utf8_general_ci . Actually I don't know what I should do... Brain stopped...

Please help. Thanks.

If you compile MFC for UNICODE, CString will be defined as a string of wchar_t using UTF16 encoding .

Constructing the CString directly from a char* like you do, works only if all chars are in the ASCII subset of UNICODE:

  • As soon as a unicode char is not ASCII, it will be encoded in UTF8 as several bytes, but the CString constructor then interprets this as two distinct chars.
  • This is the case for nüşabə with ü , ş , and ə , which each need 2 bytes in UTF8, and cause your CString to be 3 chars longer than expected.

So when you have an UTF8 encoded string in a char* , you need to convert it like explained in this SO answer , using MultiByteToWideChar() .

Edit: Code example

Instead of

        CString pass(z.c_str());

You could write something like:

        wchar_t *p = new wchar_t[z.size()+1];  // UTF16 has same length or less thant UTF8 equivalent
        MultiByteToWideChar(
             CP_UTF8,         // CodePage,
             0,               // flags,
             z.c_str(),       // pointer to UTF8 string
             -1,              // -1 for null terminated string, size otherwise 
             p,               // destination buffer for converted wchar_t string 
             z.size()+1);        // size of buffer
        CString pass(p);
        delete p; 

Note that MultiByteToWideChar() and its reverse WideCharToMultiByte() belong to the Windows API and not to MFC.

Note that the standard C++ strings have standard conversion functions that are portable:

 wstring_convert<codecvt_utf8_utf16<wchar_t>, wchar_t> conversion;
 wstring s = conversion.from_bytes(z.c_str());
 string mbs = conversion.to_bytes(L"\u00c6\u0186"); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM