字符串到 UTF-8 在 C++ 中的轉換

Question

我有一個字符串Test\xc2\xae xae 以十六進制表示為0x54 0x65 0x73 0x74 0x5c 0x78 0x63 0x32 0x5c 0x78 0x61 0x65 。 這個字符串中的字符集\xc2\xae xae不過是®（注冊商標）的UTF-8編碼。

我想寫一個 c++ function 可以將\xc2 （十六進制0x5c 0x78 0x63 0x32 ）字符集轉換為十六進制值0xc2 。

eg I want to write a c++ function which can convert Test\xc2\xae xae [ 0x54 0x65 0x73 0x74 0x5c 0x78 0x63 0x32 0x5c 0x78 0x61 0x65 ] to Test® [ 0x54 0x65 0x73 0x74 0xc2 0xae ]

Answer 1

據我了解您的問題，我認為您嘗試轉換每個\x?? 序列（四個字符），在哪里?? 是一個由兩個十六進制數字組成的序列，對應一個唯一的 char，其值以十六進制表示。

如果您不必使用專門用於此的大型庫，也許這個簡單的算法可以解決問題。

/**
  g++ -std=c++17 -o prog_cpp prog_cpp.cpp \
      -pedantic -Wall -Wextra -Wconversion -Wno-sign-conversion \
      -g -O0 -UNDEBUG -fsanitize=address,undefined
**/

#include <iostream>
#include <string>
#include <cctype>

std::string
convert_backslash_x(const std::string &str)
{
  auto result=std::string{};
  for(auto start=std::string::size_type{0};;)
  {
    const auto pos=str.find("\\x", start);
    if((pos==str.npos)||  // not found
       (pos+4>size(str))) // too near from the end
    {
      // keep the remaining of the string
      result.append(str, start);
      break;
    }
    // keep everything until this position
    result.append(str, start, pos-start);
    const auto c1=std::tolower(str[pos+2]), c2=std::tolower(str[pos+3]);
    if(std::isxdigit(c1)&&std::isxdigit(c2))
    {
      // convert two hex digits to a char with this value
      const auto h1=std::isalpha(c1) ? 10+(c1-'a') : (c1-'0');
      const auto h2=std::isalpha(c2) ? 10+(c2-'a') : (c2-'0');
      result+=char(h1*16+h2);
      // go on after this \x?? sequence
      start=pos+4; 
    }
    else
    {
      // keep this incomplete \x sequence as is
      result+="\\x";
      // go on after this \x sequence
      start=pos+2;
    }
  }
  return result;
}

int
main()
{
  for(const auto &s: {"Test\\xc2\\xae",
                      "Test\\xc2\\xae Test\\xc2\\xae",
                      "Test\\xc2\\xa",
                      "Test\\x\\xc2\\xa"})
  {
    std::cout << '(' << s << ") --> (" << convert_backslash_x(s) << ")\n";
  }
  return 0;
}

字符串到 UTF-8 在 C++ 中的轉換

問題描述

1 個解決方案

解決方案1
0 已采納 2021-04-17 08:37:37

字符串到 UTF-8 在 C++ 中的轉換

問題描述

1 個解決方案

解決方案1 0 已采納 2021-04-17 08:37:37

解決方案1
0 已采納 2021-04-17 08:37:37