簡體   English   中英

C ++從字符串中制作一個Unicode字符

[英]c++ making a unicode char from a string

我有這樣的字符串

string s = "0081";

我需要像這樣制作一個字符串

string c = "\u0081"  

如何從長度為4的原始字符串中提取長度為1的字符串?

編輯:我的錯誤,“ \\ u0081”不是char(1字節),而是2字節的字符/字符串? 所以我輸入的是二進制數1000 0001,即0x81,這就是我的字符串“ 0081”的來源。 從0x81到字符串c =“ \\ u0081”會更容易嗎? 感謝所有的幫助

干得好:

unsigned int x;
std::stringstream ss;
ss << std::hex << "1081";
ss >> x;

wchar_t wc1 = x;
wchar_t wc2 = L'\u1081';

assert(wc1 == wc2);

std::wstring ws(1, wc);

這是整個過程,基於我在其他注釋中鏈接的一些代碼。

string s = "0081";
long codepoint = strtol(s.c_str(), NULL, 16);
string c = CodepointToUTF8(codepoint);

std::string CodepointToUTF8(long codepoint)
{
    std::string out;
    if (codepoint <= 0x7f)
        out.append(1, static_cast<char>(codepoint));
    else if (codepoint <= 0x7ff)
    {
        out.append(1, static_cast<char>(0xc0 | ((codepoint >> 6) & 0x1f)));
        out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
    }
    else if (codepoint <= 0xffff)
    {
        out.append(1, static_cast<char>(0xe0 | ((codepoint >> 12) & 0x0f)));
        out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f)));
        out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
    }
    else
    {
        out.append(1, static_cast<char>(0xf0 | ((codepoint >> 18) & 0x07)));
        out.append(1, static_cast<char>(0x80 | ((codepoint >> 12) & 0x3f)));
        out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f)));
        out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
    }
    return out;
}

請注意,此代碼不會進行任何錯誤檢查,因此,如果將無效的代碼點傳遞給它,則會返回無效的字符串。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM