简体   繁体   中英

Base64 doesn't work when encoding special characters

I am working with a base64 library that I found on GitHub Link to b64.c and it works okay when I am encoding ascii strings but when I try to encode a binary file such as an image, it doesn't work. Below is the code snippet I am using to read in the file.

hello.txt

héllo

hello.txt has the only one special character. It works okay if the special character was just a regular character.

main.c

int main()
{
    FILE *fp=NULL;
    char *buf=NULL, *str1="héllo", *str2="hello";
    int i=0;
    size_t fsize=0, bytes_read=0;

    fp=fopen("hello.txt", "rb");
    fseek(fp, 0, SEEK_END);
    fsize=ftell(fp);
    rewind(fp);
    buf=(char*)malloc(sizeof(char)*(fsize));
    //buf[fsize]='\0';
    bytes_read=fread(buf, 1, fsize, fp);
    if( bytes_read!=fsize ) exit(-1);
    fclose(fp);
    printf("encoded=%s\n", b64_encode((const unsigned char*)buf, fsize));

    getchar();
    return 0;
}

encode.c // has the function base64_encode

char *b64_encode(const unsigned char* src, size_t len)
{
    int i=0, j=0;
    char *enc=NULL;
    size_t size=0;
    unsigned char buf[4], tmp[3];

    // alloc
    enc=(char*)malloc(0);
    if( enc==NULL )
    {
        perror("enc");

        return NULL;
    }

    while( len-- )
    {
        tmp[i++]=*(src++);
        if( i==3 )
        {
            buf[0]=( tmp[0]&0xfc )>>2;
            buf[1]=( ( tmp[0]&0x03 )<<4 )+( ( tmp[1]&0xf0 )>>4 );
            buf[2]=( ( tmp[1]&0x0f )<<2 )+( ( tmp[1]&0xc0 )>>6 );
            buf[3]=tmp[2]&0x3f;

            /*
             * alloc 4 bytes for 'enc' and then translate
             * each encoded buffer part by index from
             * the base64 table into 'enc' unsigned char array
            */
            enc=(char*)realloc(enc, size+4);
            for( i=0; i<4; ++i )
            {
                enc[size++]=b64_table[buf[i]];
            }

            // reset index
            i=0;
        }
    }

    if( i>0 )
    {
        // fill 'tmp' with '\0' at most 3 times
        for( j=i; j<3; ++j )
        {
            tmp[j]='\0';
        }

        // perform same codes as above
        buf[0]=( tmp[0]&0xfc )>>2;
        buf[1]=( ( tmp[0]&0x03 )<<4 )+( ( tmp[1]&0xf0 )>>4 );
        buf[2]=( ( tmp[1]&0x0f )<<2 )+( ( tmp[1]&0xc0 )>>6 );
        buf[3]=tmp[2]&0x3f;

        // same write to enc with new allocation
        for( j=0; j<i+1; ++j )
        {
            enc=(char*)realloc(enc, size+1);
            enc[size++]=b64_table[buf[j]];
        }

        while( ( i++ )<3 )
        {
            enc=(char*)realloc(enc, size+1);
            enc[size++]='=';
        }
    }

    enc=(char*)realloc(enc, size+1);
    enc[size]='\0';

    return enc;
}

ouput by program

aOnsbG9=

after saving with utf-8

aMPpbGxv  

expected output

aMOpbGxv

PS. I read the file in as binary since it has special characters and because later on I would like to read in binary data such as images or videos.

The problem lies in the function b64_encode:

buf[2]=( ( tmp[1]&0x0f )<<2 )+( ( tmp[1]&0xc0 )>>6 );

should be

buf[2]=( ( tmp[1]&0x0f )<<2 )+( ( tmp[2]&0xc0 )>>6 );

Be sure to fix this at both ocurrences.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM