简体   繁体   中英

(Only C) conversion special characters from string char to Hex. Unicode

I'm trying to use this code: https://www.includehelp.com/c/convert-ascii-string-to-hexadecimal-string-in-c.aspx

This code works perfect on my program.

It converts from utf-8 to hexa unicode characters like A,m,n,d,0,9 perfectly.

Please, can anybody tell me or modify this program, when inside the string we have "special characters", like vocals with accents (ñ,ç,à,á,...). because, when I run this program don't works as I expected.

I'm working in a RHEL 7 with native C (sorry but I don't know the version)

The special characters that I'm trying to convert to hex Unicode are in UTF-8.

#include <stdio.h>
#include <string.h>

//function to convert ascii char[] to hex-string (char[])
void string2hexString(char* input, char* output)
{
    int loop;
    int i; 

    i=0;
    loop=0;

    while(input[loop] != '\0')
    {
        sprintf((char*)(output+i),"%02X", input[loop]);
        loop+=1;
        i+=2;
    }
    //insert NULL at the end of the output string
    output[i++] = '\0';
}

int main(){
    char ascii_str[] = "Hello world!";
    //declare output string with double size of input string
    //because each character of input string will be converted
    //in 2 bytes
    int len = strlen(ascii_str);
    char hex_str[(len*2)+1];

    //converting ascii string to hex string
    string2hexString(ascii_str, hex_str);

    printf("ascii_str: %s\n", ascii_str);
    printf("hex_str: %s\n", hex_str);

    return 0;
}

Output

ascii_str: Hello world!

hex_str: 48656C6C6F20776F726C6421

I would like entry "ascii_str" like "áéíóúàèìòùç" and be able to obtain this hex UNIcodes on a string as this codes: ( unicode )

letra-> á // cod.hex--> e1
letra-> é // cod.hex--> e9
letra-> í // cod.hex--> ed
letra-> ó // cod.hex--> f3
letra-> ú // cod.hex--> fa
letra-> à // cod.hex--> e0
letra-> è // cod.hex--> e8
letra-> ì // cod.hex--> ec
letra-> ò // cod.hex--> f2
letra-> ù // cod.hex--> f9
letra-> ç // cod.hex--> e7

Please read about utf-8 encoding. Utf-8 encoding has variable length for code point(chars). Symbols you are converting occupy two bytes, not one. Thus, code you are using is not suitable for this purpose.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM