简体   繁体   English

如何使用 C++ 将 ISO-2022-KR 编码转换为 UTF-8 编码?

[英]How to convert ISO-2022-KR encoding to UTF-8 encoding using C++?

I am having these characters (Bw@e) which are encoded in charset="iso-2022-kr".我有这些以 charset="iso-2022-kr" 编码的字符 (Bw@e)。 The hex values of these characters are 28 0E 42 77 40 65 0F 29.这些字符的十六进制值为 28 0E 42 77 40 65 0F 29。

There is an API available in Unix iconv which can convert the encoding from iso-2022-kr to utf-8. Unix iconv 中有一个 API 可用,它可以将编码从 iso-2022-kr 转换为 utf-8。

Example: iconv -f iso-2022-kr -t utf8 Input > Output.示例:iconv -f iso-2022-kr -t utf8 输入 > Output。

After conversion to UTF-8, Hex values are: 28 EC B0 A8 EC 9E A5 29 (차장)转换为 UTF-8 后,十六进制值为:28 EC B0 A8 EC 9E A5 29 (차장)

If the above hex values (UTF-8) are decoded using the link below: https://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder如果使用以下链接对上述十六进制值 (UTF-8) 进行解码: https://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder

Result: As raw characters:结果:作为原始字符:

(차장) (차장)

I am looking for a source code in C++ which can convert the encoding from iso-2022-kr to UTF-8 encoding.我正在寻找 C++ 中的源代码,它可以将编码从 iso-2022-kr 转换为 UTF-8 编码。 I have taken care of the decoding part, which is encoded in UTF-8.我已经处理了解码部分,它在 UTF-8 中编码。 Any help would be appreciated.任何帮助,将不胜感激。

Here's a quick and dirty C++ program that demonstrates using the iconv library interface (Might require linking with -liconv ):这是一个快速而肮脏的 C++ 程序,它演示了如何使用 iconv 库接口(可能需要与-liconv链接):

#include <cstdio>
#include <cstdlib>
#include <iconv.h>

int main() {    
  iconv_t conv = iconv_open("utf-8", "iso-2022-kr");
  if (conv == reinterpret_cast<iconv_t>(-1)) {
    std::perror("iconv_open");
    return EXIT_FAILURE;
  }

  char iso2022kr_buf[] = "\x28\x0E\x42\x77\x40\x65\x0F\x29";
  char utf8_buf[128];
  std::size_t kr_bytes = sizeof iso2022kr_buf - 1;
  std::size_t utf8_bytes = sizeof utf8_buf;    
  char *as_iso2022kr = iso2022kr_buf;
  char *as_utf8 = utf8_buf;

  std::size_t len = iconv(conv, &as_iso2022kr, &kr_bytes, &as_utf8, &utf8_bytes);
  if (len == static_cast<std::size_t>(-1)) {
    std::perror("iconv");
    return EXIT_FAILURE;
  }
  *as_utf8 = '\0';
  for (const char *c = utf8_buf; c != as_utf8; c++) {
    std::printf("%02hhX ", *c);
  }
  std::putchar('\n');

  std::puts(utf8_buf);
  
  iconv_close(conv);
  return 0;
}

In action:在行动:

$ g++ -O -Wall -Wextra iconv_demo.cpp
$ ./a.out
28 EC B0 A8 EC 9E A5 29 
(차장)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM