繁体   English   中英

将 C++ 中的 unicode 字符串转换为大写

[英]Convert a unicode String In C++ To Upper Case

我们如何在 C 或 C++ 中将多语言字符串或 unicode 字符串转换为大写/小写。

如果您的系统已经是 UTF-8,通过使用std::use_facet ,您可以编写:

#include <iostream>
#include <locale.h>

int main() {
    std::locale::global(std::locale(""));  // (*)
    std::wcout.imbue(std::locale());
    auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());

    std::wstring str = L"Zoë Saldaña played in La maldición del padre Cardona.";

    f.toupper(&str[0], &str[0] + str.size());
    std::wcout << str << std::endl;

    return 0;
}

你得到( http://ideone.com/AFHoHC ):

ZOË SALDAÑA 在 LA MALDICIÓN DEL PADRE CARDONA 演出。

如果它不起作用,则必须将(*)更改为std::locale::global(std::locale("en_US.UTF8")); 或您在平台上实际拥有的 UTF-8 语言环境。

我找到了该问题的 2 个解决方案_

1. setlocale(LC_CTYPE, "en_US.UTF-8"); // 语言环境将是启用 UTF-8 的英语

std::wstring str = L"Zoë Saldaña played in La maldición del padre Cardona.ëèñ";

std::wcout << str << std::endl;

for (wstring::iterator it = str.begin(); it != str.end(); ++it)
    *it = towupper(*it);

std::wcout << "toUpper_onGCC_LLVM_1 :: "<< str << std::endl;

这是在 LLVM GCC 4.2 编译器上工作的。

2. std::locale::global(std::locale("en_US.UTF-8")); // 语言环境将是启用 UTF-8 的英语

std::wcout.imbue(std::locale());
const std::ctype<wchar_t>& f = std::use_facet< std::ctype<wchar_t> >(std::locale());

std::wstring str = L"Chloëè";//"Zoë Saldaña played in La maldición del padre Cardona.";

f.toupper(&str[0], &str[0] + str.size());   

std::wcout << str << std::endl;

这适用于 Apple LLVM 4.2。

我在 Xocde 上运行的两种情况。 但是我正在寻找一种使用 g++ Compiler 在 Eclipse 中运行此代码的方法。

如果你要做对的话,会有很多困难。

通常的用例是用于比较目的,但问题比这更普遍。

有一个从大约2000 C ++报告马特Austern一个相当详细的文件在这里(PDF)

如果您想要一个理智而成熟的解决方案,请查看IBM 的 ICU 下面是一个例子:

#include <iostream>
#include <unicode/unistr.h>
#include <string>

int main(){
    icu::UnicodeString us("óóßChloë");
    us.toUpper(); //convert to uppercase in-place
    std::string s;
    us.toUTF8String(s);
    std::cout<<"Upper: "<<s<<"\n";

    us.toLower(); //convert to lowercase in-place
    s.clear();
    us.toUTF8String(s);
    std::cout<<"Lower: "<<s<<"\n";
    return 0;
}

输出:

Upper: ÓÓSSCHLOË
Lower: óósschloë

注意:在后面的步骤中, SS不会被视为德语ß首都

首先设置语言环境,例如:

setlocale(LC_ALL, "German")); /*This won't work as per comments below */

setlocale(LC_ALL, "English"));

setlocale( LC_MONETARY, "French" );

setlocale( LC_ALL, "" ); //default locale 

然后使用

std::use_facet std::locale如下:-

typedef std::string::value_type char_t;
char_t upcase( char_t ch )
{
 return std::use_facet< std::ctype< char_t > >( std::locale() ).toupper( ch );
}

std::string toupper( const std::string &src )
{
 std::string result;
 std::transform( src.begin(), src.end(), std::back_inserter( result ), upcase );
 return result;
}

const std::string src  = "Hello World!";
std::cout << toupper( src );

在 Windows 中,对于区域设置未知的混合语言应用程序,请考虑CharUpperBuffWCharLowerBuffW 这些函数处理toupper()不处理的变音符号。

您可以遍历wstring并使用towupper / towlower

for (wstring::iterator it = a.begin(); it != a.end(); ++it)
        *it = towupper(*it);

对于 CI 将在当前线程中调整 C 语言环境后使用toupper

setlocale(LC_CTYPE, "en_US.UTF8");

对于 C++,我会使用std::ctype<char>toupper方法:

std::locale loc;

auto& f = std::use_facet<std::ctype<char>>(loc);

char str[80] = "Hello World";

f.toupper(str, str+strlen(str));

基于Kyle_the_hacker 的----->回答我的额外内容。

Ubuntu

在终端中列出所有语言环境
locale -a

安装所有语言环境
sudo apt-get install -y locales locales-all

编译main.cpp
$ g++ main.cpp

运行编译的程序
$ ./a.out

结果

Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë

Ubuntu Linux - 来自 VSCODE 的 WSL

Ubuntu Linux - WSL

视窗

在 cmd 中运行 VCVARS 开发人员工具
"C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Auxiliary\\Build\\vcvars64.bat"

编译main.cpp
> cl /EHa main.cpp /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /std:c++17 /DYNAMICBASE "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /MTd

Compilador de optimización de C/C++ de Microsoft (R) versión 19.27.29111 para x64
(C) Microsoft Corporation. Todos los derechos reservados.

main.cpp
Microsoft (R) Incremental Linker Version 14.27.29111.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj
kernel32.lib
user32.lib
gdi32.lib
winspool.lib
comdlg32.lib
advapi32.lib
shell32.lib
ole32.lib
oleaut32.lib
uuid.lib
odbc32.lib
odbccp32.lib

运行主程序
>main.exe

结果

Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë

视窗

代码 - main.cpp

此代码仅在 Windows x64 和 Ubuntu Linux x64 上进行了测试。

/*
 * Filename: c:\Users\x\Cpp\main.cpp
 * Path: c:\Users\x\Cpp
 * Filename: /home/x/Cpp/main.cpp
 * Path: /home/x/Cpp
 * Created Date: Saturday, October 17th 2020, 10:43:31 pm
 * Author: Joma
 *
 * No Copyright 2020
 */


#include <iostream>
#include <set>
#include <string>
#include <locale>

// WINDOWS
#if (_WIN32)
#include <Windows.h>
#include <conio.h>
#define WINDOWS_PLATFORM 1
#define DLLCALL STDCALL
#define DLLIMPORT _declspec(dllimport)
#define DLLEXPORT _declspec(dllexport)
#define DLLPRIVATE
#define NOMINMAX

//EMSCRIPTEN
#elif defined(__EMSCRIPTEN__)
#include <emscripten/emscripten.h>
#include <emscripten/bind.h>
#include <unistd.h>
#include <termios.h>
#define EMSCRIPTEN_PLATFORM 1
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

// LINUX - Ubuntu, Fedora, , Centos, Debian, RedHat
#elif (__LINUX__ || __gnu_linux__ || __linux__ || __linux || linux)
#define LINUX_PLATFORM 1
#include <unistd.h>
#include <termios.h>
#define DLLCALL CDECL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#define CoTaskMemAlloc(p) malloc(p)
#define CoTaskMemFree(p) free(p)

//ANDROID
#elif (__ANDROID__ || ANDROID)
#define ANDROID_PLATFORM 1
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

//MACOS
#elif defined(__APPLE__)
#include <unistd.h>
#include <termios.h>
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#include "TargetConditionals.h"
#if TARGET_OS_IPHONE && TARGET_IPHONE_SIMULATOR
#define IOS_SIMULATOR_PLATFORM 1
#elif TARGET_OS_IPHONE
#define IOS_PLATFORM 1
#elif TARGET_OS_MAC
#define MACOS_PLATFORM 1
#else

#endif

#endif



typedef std::string String;
typedef std::wstring WString;

#define EMPTY_STRING u8""s
#define EMPTY_WSTRING L""s

using namespace std::literals::string_literals;

class Strings
{
public:
    static String WideStringToString(const WString& wstr)
    {
        if (wstr.empty())
        {
            return String();
        }
        size_t pos;
        size_t begin = 0;
        String ret;

#if WINDOWS_PLATFORM
        int size;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
        while (pos != WString::npos && begin < wstr.length())
        {
            WString segment = WString(&wstr[begin], pos - begin);
            size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
            String converted = String(size, 0);
            WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
        }
        if (begin <= wstr.length())
        {
            WString segment = WString(&wstr[begin], wstr.length() - begin);
            size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
            String converted = String(size, 0);
            WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
            ret.append(converted);
        }
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        size_t size;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
        while (pos != WString::npos && begin < wstr.length())
        {
            WString segment = WString(&wstr[begin], pos - begin);
            size = wcstombs(nullptr, segment.c_str(), 0);
            String converted = String(size, 0);
            wcstombs(&converted[0], segment.c_str(), converted.size());
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
        }
        if (begin <= wstr.length())
        {
            WString segment = WString(&wstr[begin], wstr.length() - begin);
            size = wcstombs(nullptr, segment.c_str(), 0);
            String converted = String(size, 0);
            wcstombs(&converted[0], segment.c_str(), converted.size());
            ret.append(converted);
        }
#else
        static_assert(false, "Unknown Platform");
#endif
        return ret;
    }

    static WString StringToWideString(const String& str)
    {
        if (str.empty())
        {
            return WString();
        }

        size_t pos;
        size_t begin = 0;
        WString ret;
#ifdef WINDOWS_PLATFORM
        int size = 0;
        pos = str.find(static_cast<char>(0), begin);
        while (pos != std::string::npos) {
            std::string segment = std::string(&str[begin], pos - begin);
            std::wstring converted = std::wstring(segment.size() + 1, 0);
            size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.length());
            converted.resize(size);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = str.find(static_cast<char>(0), begin);
        }
        if (begin < str.length()) {
            std::string segment = std::string(&str[begin], str.length() - begin);
            std::wstring converted = std::wstring(segment.size() + 1, 0);
            size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, segment.c_str(), segment.size(), &converted[0], converted.length());
            converted.resize(size);
            ret.append(converted);
        }

#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        size_t size;
        pos = str.find(static_cast<char>(0), begin);
        while (pos != String::npos)
        {
            String segment = String(&str[begin], pos - begin);
            WString converted = WString(segment.size(), 0);
            size = mbstowcs(&converted[0], &segment[0], converted.size());
            converted.resize(size);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = str.find(static_cast<char>(0), begin);
        }
        if (begin < str.length())
        {
            String segment = String(&str[begin], str.length() - begin);
            WString converted = WString(segment.size(), 0);
            size = mbstowcs(&converted[0], &segment[0], converted.size());
            converted.resize(size);
            ret.append(converted);
        }
#else
        static_assert(false, "Unknown Platform");
#endif
        return ret;
    }


    static WString ToUpper(const WString& data)
    {
        WString result = data;
        auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());

        f.toupper(&result[0], &result[0] + result.size());
        return result;
    }

    static String  ToUpper(const String& data)
    {
        return WideStringToString(ToUpper(StringToWideString(data)));
    }

    static WString ToLower(const WString& data)
    {
        WString result = data;
        auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
        f.tolower(&result[0], &result[0] + result.size());
        return result;
    }

    static String ToLower(const String& data)
    {
        return WideStringToString(ToLower(StringToWideString(data)));
    }

};

enum class ConsoleTextStyle
{
    DEFAULT = 0,
    BOLD = 1,
    FAINT = 2,
    ITALIC = 3,
    UNDERLINE = 4,
    SLOW_BLINK = 5,
    RAPID_BLINK = 6,
    REVERSE = 7,
};

enum class ConsoleForeground
{
    DEFAULT = 39,
    BLACK = 30,
    DARK_RED = 31,
    DARK_GREEN = 32,
    DARK_YELLOW = 33,
    DARK_BLUE = 34,
    DARK_MAGENTA = 35,
    DARK_CYAN = 36,
    GRAY = 37,
    DARK_GRAY = 90,
    RED = 91,
    GREEN = 92,
    YELLOW = 93,
    BLUE = 94,
    MAGENTA = 95,
    CYAN = 96,
    WHITE = 97
};

enum class ConsoleBackground
{
    DEFAULT = 49,
    BLACK = 40,
    DARK_RED = 41,
    DARK_GREEN = 42,
    DARK_YELLOW = 43,
    DARK_BLUE = 44,
    DARK_MAGENTA = 45,
    DARK_CYAN = 46,
    GRAY = 47,
    DARK_GRAY = 100,
    RED = 101,
    GREEN = 102,
    YELLOW = 103,
    BLUE = 104,
    MAGENTA = 105,
    CYAN = 106,
    WHITE = 107
};

class Console
{
private:
    static void EnableVirtualTermimalProcessing()
    {
#if defined WINDOWS_PLATFORM
        HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
        DWORD dwMode = 0;
        GetConsoleMode(hOut, &dwMode);
        if (!(dwMode & ENABLE_VIRTUAL_TERMINAL_PROCESSING))
        {
            dwMode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
            SetConsoleMode(hOut, dwMode);
        }
#endif
    }

    static void ResetTerminalFormat()
    {
        std::cout << u8"\033[0m";
    }

    static void SetVirtualTerminalFormat(ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
    {
        String format = u8"\033[";
        format.append(std::to_string(static_cast<int>(foreground)));
        format.append(u8";");
        format.append(std::to_string(static_cast<int>(background)));
        if (styles.size() > 0)
        {
            for (auto it = styles.begin(); it != styles.end(); ++it)
            {
                format.append(u8";");
                format.append(std::to_string(static_cast<int>(*it)));
            }
        }
        format.append(u8"m");
        std::cout << format;
    }
public:
    static void Clear()
    {

#ifdef WINDOWS_PLATFORM
        std::system(u8"cls");
#elif LINUX_PLATFORM || defined MACOS_PLATFORM
        std::system(u8"clear");
#elif EMSCRIPTEN_PLATFORM
        emscripten::val::global()["console"].call<void>(u8"clear");
#else
        static_assert(false, "Unknown Platform");
#endif
    }

    static void Write(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
#ifndef EMSCRIPTEN_PLATFORM
        EnableVirtualTermimalProcessing();
        SetVirtualTerminalFormat(foreground, background, styles);
#endif
        String str = s;
#ifdef WINDOWS_PLATFORM
        WString unicode = Strings::StringToWideString(str);
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUX_PLATFORM || defined MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        std::cout << str;
#else
        static_assert(false, "Unknown Platform");
#endif

#ifndef EMSCRIPTEN_PLATFORM
        ResetTerminalFormat();
#endif
    }

    static void WriteLine(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        Write(s, foreground, background, styles);
        std::cout << std::endl;
    }

    static void Write(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
#ifndef EMSCRIPTEN_PLATFORM
        EnableVirtualTermimalProcessing();
        SetVirtualTerminalFormat(foreground, background, styles);
#endif
        WString str = s;

#ifdef WINDOWS_PLATFORM
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), str.c_str(), static_cast<DWORD>(str.length()), nullptr, nullptr);
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        std::cout << Strings::WideStringToString(str);
#else
        static_assert(false, "Unknown Platform");
#endif

#ifndef EMSCRIPTEN_PLATFORM
        ResetTerminalFormat();
#endif
    }

    static void WriteLine(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        Write(s, foreground, background, styles);
        std::cout << std::endl;
    }

    static void WriteLine()
    {
        std::cout << std::endl;
    }

    static void Pause()
    {
        char c;
        do
        {
            c = getchar();
            std::cout << "Press Key " << std::endl;
        } while (c != 64);
        std::cout << "KeyPressed" << std::endl;
    }

    static int PauseAny(bool printWhenPressed = false, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        int ch;
#ifdef WINDOWS_PLATFORM
        ch = _getch();
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        struct termios oldt, newt;
        tcgetattr(STDIN_FILENO, &oldt);
        newt = oldt;
        newt.c_lflag &= ~(ICANON | ECHO);
        tcsetattr(STDIN_FILENO, TCSANOW, &newt);
        ch = getchar();
        tcsetattr(STDIN_FILENO, TCSANOW, &oldt);
#else
        static_assert(false, "Unknown Platform");
#endif
        if (printWhenPressed)
        {
            Console::Write(String(1, ch), foreground, background, styles);
        }
        return ch;
    }
};



int main()
{
    std::locale::global(std::locale(u8"en_US.UTF-8"));
    String dataStr = u8"Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë";
    WString dataWStr = L"Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë";
    std::string locale = u8"";
    //std::string locale = u8"de_DE.UTF-8";
    //std::string locale = u8"en_US.UTF-8";
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    dataStr = Strings::ToUpper(dataStr);
    dataWStr = Strings::ToUpper(dataWStr);
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    dataStr = Strings::ToLower(dataStr);
    dataWStr = Strings::ToLower(dataWStr);
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    
    
    Console::WriteLine(u8"Press any key to exit"s, ConsoleForeground::DARK_GRAY);
    Console::PauseAny();

    return 0;
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM