I am having these characters (Bw@e) which are encoded in charset="iso-2022-kr". The hex values of these characters are 28 0E 42 77 40 65 0F 29.
There is an API available in Unix iconv which can convert the encoding from iso-2022-kr to utf-8.
Example: iconv -f iso-2022-kr -t utf8 Input > Output.
After conversion to UTF-8, Hex values are: 28 EC B0 A8 EC 9E A5 29 (차장)
If the above hex values (UTF-8) are decoded using the link below: https://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder
Result: As raw characters:
(차장)
I am looking for a source code in C++ which can convert the encoding from iso-2022-kr to UTF-8 encoding. I have taken care of the decoding part, which is encoded in UTF-8. Any help would be appreciated.
Here's a quick and dirty C++ program that demonstrates using the iconv library interface (Might require linking with -liconv
):
#include <cstdio>
#include <cstdlib>
#include <iconv.h>
int main() {
iconv_t conv = iconv_open("utf-8", "iso-2022-kr");
if (conv == reinterpret_cast<iconv_t>(-1)) {
std::perror("iconv_open");
return EXIT_FAILURE;
}
char iso2022kr_buf[] = "\x28\x0E\x42\x77\x40\x65\x0F\x29";
char utf8_buf[128];
std::size_t kr_bytes = sizeof iso2022kr_buf - 1;
std::size_t utf8_bytes = sizeof utf8_buf;
char *as_iso2022kr = iso2022kr_buf;
char *as_utf8 = utf8_buf;
std::size_t len = iconv(conv, &as_iso2022kr, &kr_bytes, &as_utf8, &utf8_bytes);
if (len == static_cast<std::size_t>(-1)) {
std::perror("iconv");
return EXIT_FAILURE;
}
*as_utf8 = '\0';
for (const char *c = utf8_buf; c != as_utf8; c++) {
std::printf("%02hhX ", *c);
}
std::putchar('\n');
std::puts(utf8_buf);
iconv_close(conv);
return 0;
}
In action:
$ g++ -O -Wall -Wextra iconv_demo.cpp
$ ./a.out
28 EC B0 A8 EC 9E A5 29
(차장)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.