Unicode characters become question marks (C++ and MFC Dialog Based App)

Question

These codes sets the values to editbox. But I'm having trouble when I retrieve Unicode characters from MySQL table.

For example, string nüşabə sets as nÃ¼ÅŸabÉ™ .

Here is my codes.

void CmysqlDlg::OnBnClickedButton3()
{
    USES_CONVERSION;

    try
    {
        mysql::MySQL_Driver *driver = new mysql::MySQL_Driver;
        Connection *dbConn;
        Statement *st;
        ResultSet *res;

        driver = mysql::get_mysql_driver_instance();
        dbConn = driver->connect("tcp://127.0.0.1:3306", "root", "connection");
        dbConn->setSchema("mfc_app_database");

        st = dbConn->createStatement();
        res = st->executeQuery("SELECT password FROM users WHERE id=1");
        string z;
        while (res->next())
        {
            //k = res->getString("username");
            //cs.Format(_T("%s"), k);
            //CString cs(k.c_str(), CP_UTF8);
            //combo.AddString(cs);
            //usernameData.SetWindowTextW(cs);

            z = res->getString("password");
            CString pass(z.c_str()/*, CP_UTF8*/);
            nameData.SetWindowTextW(pass);
        }


        delete res;
        delete st;
        delete dbConn;
        delete driver;
    }
    catch (exception e)
    {
        ofstream file("sadaasad.txt");
        file << e.what();
        file.close();
    }
}

Database collation is set to utf8_general_ci . Actually I don't know what I should do... Brain stopped...

Please help. Thanks.

Answer 1

If you compile MFC for UNICODE, CString will be defined as a string of wchar_t using UTF16 encoding .

Constructing the CString directly from a char* like you do, works only if all chars are in the ASCII subset of UNICODE:

As soon as a unicode char is not ASCII, it will be encoded in UTF8 as several bytes, but the CString constructor then interprets this as two distinct chars.
This is the case for nüşabə with ü , ş , and ə , which each need 2 bytes in UTF8, and cause your CString to be 3 chars longer than expected.

So when you have an UTF8 encoded string in a char* , you need to convert it like explained in this SO answer , using MultiByteToWideChar() .

Edit: Code example

Instead of

        CString pass(z.c_str());

You could write something like:

        wchar_t *p = new wchar_t[z.size()+1];  // UTF16 has same length or less thant UTF8 equivalent
        MultiByteToWideChar(
             CP_UTF8,         // CodePage,
             0,               // flags,
             z.c_str(),       // pointer to UTF8 string
             -1,              // -1 for null terminated string, size otherwise 
             p,               // destination buffer for converted wchar_t string 
             z.size()+1);        // size of buffer
        CString pass(p);
        delete p;

Note that MultiByteToWideChar() and its reverse WideCharToMultiByte() belong to the Windows API and not to MFC.

Note that the standard C++ strings have standard conversion functions that are portable:

 wstring_convert<codecvt_utf8_utf16<wchar_t>, wchar_t> conversion;
 wstring s = conversion.from_bytes(z.c_str());
 string mbs = conversion.to_bytes(L"\u00c6\u0186");

Unicode characters become question marks (C++ and MFC Dialog Based App)

Question

1 answers

solution1
2 ACCPTED 2015-03-13 21:26:25

Unicode characters become question marks (C++ and MFC Dialog Based App)

Question

1 answers

solution1 2 ACCPTED 2015-03-13 21:26:25

solution1
2 ACCPTED 2015-03-13 21:26:25