简体   繁体   中英

How to escape html entities in C?

I'm trying to decode HTML entities (in the format ' ) in C.

So far I've got some code to try and decode them but it seems to produce odd output.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char* convertHtmlEntities(char* str) {
    size_t length = strlen(str);
    size_t i;
    char *endchar = malloc(sizeof(char));
    long charCode;
    if (!endchar) {
        fprintf(stderr,"not enough memory");
        exit(EXIT_FAILURE);
    }
    for (i=0;i<length;i++) {
        if (*(str+i) == '&' && *(str+i+1) == '#' && *(str+i+2) >= '0' && *(str+i+2) <= '9' && *(str+i+3) >= '0' && *(str+i+3) <= '9' && *(str+i+4) == ';') {
            charCode = strtol(str+i+2,&endchar,0);
            printf("ascii %li\n",charCode);
            *(str+i) = charCode;
            strncpy(str+i+1,str+i+5,length - (i+5));
            *(str + length - 5) = 0; /* null terminate string */
        }
    }
    return str;
}

int main()
{
    char string[] = "Helloworld&#39;s parent company has changed - comF";
    printf("%s",convertHtmlEntities(&string));
}

I'm not sure if the main statement is correct because I just made it for this example as my program generates it from a web url, however the idea is the same.

The function does replace the &#39; with a apostrophe, but the output is garbled at the end and just after the replacement.

Does anyone have a solution?

strncpy (or strcpy ) does not work for overlapping strings.

Your strings str+i+1 and str+i+5 overlap. Don't do that!

Replace strncpy with memmove

            *(str+i) = charCode;
            memmove(str+i+1,str+i+5,length - (i+5) + 1); /* also copy the '\0' */
            /* strncpy(str+i+1,str+i+5,length - (i+5)); */
            /* *(str + length - 5) = 0; */ /* null terminate string */

I had another problem with the code - it cut the last 'F' character. I replaced this line:

 *(str + length - 5) = 0; /* null terminate string */

with this:

 *(str + length - 4) = 0; /* null terminate string */

I belive it's because you delete five chars and add one, so the new length is not old-5, but old-4.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM