简体   繁体   English

ISO-8859到UTF-8转换C ++

[英]ISO-8859 to UTF-8 Conversion C++

I have been trying to convert the ISO-8859 charset to utf-8 with the code obtained from : Convert ISO-8859-1 strings to UTF-8 in C/C++ Here is my code : 我一直在尝试使用从以下代码获取的代码将ISO-8859字符集转换为utf-8: 在C / C ++中将ISO-8859-1字符串转换为UTF-8这是我的代码:

#include <iostream>
#include <string>

using namespace std;
int main(int argc,char* argv[])
{
    string fileName ="ħëlö";
    int len= fileName.length();
    char* in = new char[len+1];
    char* out = new char[2*(len+1)];
    memset(in,'\0',len+1);
    memset(out,'\0',len+1);
    memcpy(in,fileName.c_str(),2*(len+1));


    while( *in )
    {
            cout << " ::: " << in ;
            if( *in <128 )
            {
                    *out++ = *in++;
            }
            else
            {
                    *out++ = 0xc2+(*in>0xbf);
                    *out++ = (*in++&0x3f)+0x80;
            }
    }
    cout << "\n\n out ::: " << out << "\n";
    *out = '\0';
}

But the output is 但是输出是

::: ħëlö ::: ?ëlö ::: ëlö ::: ?lö ::: lö ::: ö ::: ?

 out :::   

The output 'out' should be a utf-8 string and it is not. 输出'out'应该是utf-8字符串,不是。 I'm getting this in Mac OS X.. 我在Mac OS X中得到了这个。

What am i doing wrong here ..? 我在这里做错什么..?

You are incrementing the out pointer in the loop, causing you to lose track of where the output starts. 您正在循环中增加out指针,从而使您无法跟踪输出的开始位置。 The pointer being passed to cout is the incremented one, so it obviously doesn't point at the start of the generated output any longer. 传递给cout的指针是递增的指针,因此显然不再指向生成的输出的开头。

Further, the termination of out happens after printing it, which of course is the wrong way around. 此外,终止out打印出来,这当然是围绕着走错路会发生。

Also, this relies on the encoding of the source code and stuff, not very nice. 而且,这依赖于源代码和内容的编码,不是很好。 You should express the input string differently, using individual characters with hex values or something to be on the safe side. 您应该使用带有十六进制值的单个字符或出于安全考虑之类的不同方式来表示输入字符串。

ISO-8859-1 does not have the character ħ so your source cannot possibly be in ISO-8859-1 as the method requires. ISO-8859-1没有字符ħ因此您的源可能无法按照该方法的要求位于ISO-8859-1中。 Or your source is in ISO-8859-1, but ħ will be replaced with ? 或者您的来源在ISO-8859-1中,但是ħ将替换为? once you save it. 保存后

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM