简体   繁体   中英

Decoding base64 while using GitHub API to Download a File

I am using the GitHub API to download a file from GitHub. I have been able to successfully authenticate as well as get a response from github, and see a base64 encoded string representing the file contents.

Unfortunately, I get an unusual error (string length is not a multiple of 4) when decoding the base64 string.

The HTTP request is illustrated below:

GET /repos/:owner/:repo/contents/:path

The (partial) response is illustrated below:

{
    "name":....,
    "download_url":...",
    "type":"file",
    "content":"ewogICAgInN3YWdnZXIiOiAiM...
}

The issue I am encountering is that the length of the string is 15263 bytes, and I get an error in decoding the string (string length is not a multiple of 4). I am using node.js and the 'base64-js' npm module to decode the string. Code to execute the decoding is illustrated below:

var base64 = require('base64-js');
var contents = base64.toByteArray(fileContent);

The decoding causes an exception:

Error: Invalid string. Length must be a multiple of 4
    at placeHoldersCount (.../node_modules/base64-js/index.js:23:11)
    at Object.toByteArray (...node_modules/base64-js/index.js:42:18)
    :
    :

I would think that the GitHub API is sending me the correct data, so I figure that is not the issue.

Am I performing the decoding improperly or is there another problem I am overlooking?

Any help is appreciated.

I experimented a bit and found a solution by using a different base64 decoding library as follows:

var base64 = require('js-base64').Base64;
var contents = base64.decode(res.content);

I am not sure if it is mandatory to have an encoded string length divisible by 4 (clearly my 15263 character length string is not divisible by 4) but the alternate library decoded the string properly.

A second solution which I also found to work is specific to how to use the GitHub API. By adding the following to the GitHub API call header, I was also able to get the decoded file contents:

'accept': 'application/vnd.github.VERSION.raw'

After much experimenting, I think I nailed down the difference between the working and broken base64 decoding.

It appears GitHub Base-64 encodes with:

  • UTF-8 charset
  • Base 64 MIME encoder (RFC2045)

As opposed to a "basic" (RFC4648) Base64 encoder. Several languages seem to default to the basic encoder (including Java, which I was using). When I switched to a MIME encoder, I got the full contents of the file un-garbled. This would explain why switching libraries in some cases fixed the issue.

I will note the contents field contained newline characters - decoders are supposed to ignore them, but not all do, so if you still get errors, you may need to try removing them.

The media-type header will do the job better, however in my case I am trying to use the API via a GitHub App - at time of writing, GitHub requires a specific media type be used when doing that, and it returns the JSON response.

For some reason the Github APIs base64 encoded content doesn't decode properly at all the online base64 decoders I've tried from the front page of google.

Python works however:

import base64
base64.b64decode("ewogICAgInN3YWdnZXIiOiAiM...")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM