简体   繁体   中英

How to convert base13 string to base64

I have to make a URL shortener for query strings. Have spent few days trying to compress array data into base64 strings. Thinking that the best approach may be to interpret something like "[[1,2,9,3],[1,0,2],[39,4]]" as base13 with numbers 0-9 and [], symbols.

how the current algorithm works: convert the stringified arrays into an array of base13, where each element represents 1 unique character, convert this array to base10 number, convert this number to base 64 string.

but the problem is when converting the base13 array to base10 number, it makes large numbers like 5.304781188371057e+86 which cant be held in js.

I am open to alternative solutions of course, but please do not suggest something like creating a database of URLs as it won't work as I have up to 51!*51! unique URLs, better to just make a compact encodable and decodable query string and decode it as soon as the website is accessed.

//convert stringified array to array of base13(each element = each digit of base13 number)
function stringToArray(string)
{
    let charSet = "[],1234567890";
    let array = [];
    for(let i = 0; i < string.length; i++)
    {
        array.push(charSet.indexOf(string[i]));
    }
    return array;
}

//convert base13 array to one large decimal number
function arrayToDecimal(array, base)
{
    var decimal = 0;
    for(let i = 0; i < array.length; i++)
    {
        decimal += array[i] * Math.pow(base, i)
    }
    return decimal;
}

//convert decimal number back to array
function decimalToArray(decimal, base)
{
    var quotient = decimal;
    var remainder = [];
    while(quotient > base)
    {
        remainder.push(quotient % base)
        quotient = Math.floor(quotient / base);
    }
    remainder.push(quotient % base)
    return remainder;
}

const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';

// binary to string lookup table
const b2s = alphabet.split('');

// string to binary lookup table
// 123 == 'z'.charCodeAt(0) + 1
const s2b = new Array(123);
for(let i = 0; i < alphabet.length; i++)
{
    s2b[alphabet.charCodeAt(i)] = i;
}

// number to base64
const ntob = (number) =>
{
    if(number < 0) return `-${ntob(-number)}`;

    let lo = number >>> 0;
    let hi = (number / 4294967296) >>> 0;

    let right = '';
    while(hi > 0)
    {
        right = b2s[0x3f & lo] + right;
        lo >>>= 6;
        lo |= (0x3f & hi) << 26;
        hi >>>= 6;
    }

    let left = '';
    do {
        left = b2s[0x3f & lo] + left;
        lo >>>= 6;
    } while(lo > 0);

    return left + right;
};

// base64 to number
const bton = (base64) =>
{
    let number = 0;
    const sign = base64.charAt(0) === '-' ? 1 : 0;

    for(let i = sign; i < base64.length; i++)
    {
        number = number * 64 + s2b[base64.charCodeAt(i)];
    }

    return sign ? -number : number;
};



console.log(decimalToArray(bton(ntob(arrayToDecimal([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 13))), 13)) 
//encoded and decoded, works output:[1,1,1,1,1,1,1,1,1,1,1,1,1]
console.log(arrayToDecimal([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 13)) 
//encoding doesnt work, array to decimal converts to 5.304781188371057e+86```

An interesting problem... The first thing you will need to assess is whether the base conversion compression you're seeking is worthwhile. Ie, how many base 64 characters are required to represent n characters of base 13? This involves solving...

13 ** n = 64 ** x

Solving for x, we get...

 x = n * log(13) / log(64)

Ie, for every n digits of base 13, how many digits of base 64 are required. A sampling of a few values of n returns...

  • n = 6, x = 3.70
  • n = 7, x = 4.31
  • n = 8, x = 4.93
  • n = 9, x = 5.55
  • n = 10, x = 6.17
  • n = 11, x = 6.78
  • n = 12, x = 7.40
  • n = 13, x = 8.01
  • n = 14, x = 8.63
  • n = 15, x = 9.25
  • n = 16, x = 9.86

So how to interpret this? If you have 10 digits of base 13, you're going to need 7 digits (6.17 rounded up) of base 64. So the best ratio is when x is equal to, or just under, a whole number. So, 8 digits of base 13 requires 5 digits of base 64, achieving a best case of 5/8 or 62.5% compression ratio.

Assuming that's good enough to meet your requirement, then the following function converts the "base13" string to base 64.

const base13Chars = "0123456789[],";
const base64Chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_';  
// see https://en.wikipedia.org/wiki/Query_string for URL parameter allowable characters.

function base13toBase64(x13) {

    base13 = x13.split("").map( c => base13Chars.indexOf(c) );

    // Make the array an even multiple of 8
    for (i = base13.length; i % 8 !==0; i++) {
        base13[i] = 0;
    }

    x64 = "";
    for (i = 0; i < base13.length; i += 8) {
        // Calculate base13 value of the next 8 characters.
        let n = 0;
        for (j = 0; j < 8; j++) {
            n = n * 13 + base13[i + j];
        }
        // Now calculate the base64 of n.
        for (j = 0; j < 5; j++) {
            x64 = x64 + base64Chars.substr(n % 64,1);
            n = Math.floor(n / 64);
        }   
    }

    return x64;
}

Running the above...

 base13toBase64( "[[1,2,9,3],[1,0,2],[39,4]]" ) returns "ilYKerYlgEJ4PxAAjaJi"

Note that the original value is a length of 26 characters, and the base64 value is 20 characters, so the compression ratio is 77%, not quite the optimal 62.5%. This is because of the padding to bring the original array to 32 characters, an even multiple of 8. The longer the string to encode, though, the closer the ratio will be to 62.5%.

Then, on the server side you'll need the constants above plus the following function to "uncompress" the base64 to the base13 stringified URL...

function base64toBase13(x64) {

    base64 = x64.split("").map( c => base64Chars.indexOf(c) );

    x13 = "";
    for (i = 0; i < base64.length; i += 5) {
        // Calculate base64 value of the next 5 characters.
        let n = 0;
        for (j = 5 - 1; 0 <= j; j--) {
            n = n * 64 + base64[i + j];
        }
        // Now calculate the base13 of n.
        let x = "";
        for (j = 0; j < 8; j++) {
            x = base13Chars.substr(n % 13,1) + x;
            n = Math.floor(n / 13);
        }
        x13 = x13 + x;
    }

    // Removed the trailing 0's as a result of the buffering in
    // base13toBase64 to make the array an even multiple of 8.
    while (x13.substr(-1,1) === "0") {
        x13 = x13.substr(0, x13.length - 1);
    }

    return x13;
}

Running the above...

 base64toBase13 ( "ilYKerYlgEJ4PxAAjaJi" ) returns "[[1,2,9,3],[1,0,2],[39,4]]"

Hope this helps...

The best compression is when you can leave stuff out.

assuming your data structure is Array<Array<int>> given by the one sample, we can leave out pretty much everything that doesn't contribute to the data itself.

I'm not compressing the string, but the data itself with 1 b64Character / 5 bits needed to represent a number. as for the structure we only store the number of sub-arrays and their respective lengths; so more or less an additional character per Array in your data.

boils down to:

 function encode(data) { const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'; let str = ""; function encode(nr, hasMoreDigits) { if (nr > 31) { // I need more bits/characters to encode this number. //encode the more significant bits with the 0b100000 flag encode(nr >>> 5, 32); } // 0b011111 payload | 0b100000 flag const index = nr & 31 | hasMoreDigits; str += alphabet[index]; } encode(data.length); data.forEach(arr => { encode(arr.length); arr.forEach(v => encode(v >>> 0 /* int32 -> uint32 */)); }); return str; } function decode(str) { const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'; let i = 0; function parse() { let nr = 0, hasMoreDigits; do { const index = alphabet.indexOf(str.charAt(i++)); nr = nr << 5 | index & 31; // 0b011111 payload hasMoreDigits = index & 32; // 0b100000 flag } while (hasMoreDigits); return nr; // int32 due to the bit operations above } let data = Array(parse()); for (let j = 0; j < data.length; ++j) { let arr = data[j] = Array(parse()); for (let k = 0; k < arr.length; ++k) { arr[k] = parse(); } } return data; } let data = [ [1, 2, 9, 3], [1, 0, 2], [39, 4] ]; let text = encode(data); let data2 = decode(text); console.log("input:", data); console.log("encoded:", text, "length:", text.length); console.log("output:", data2); console.log("equal:", JSON.stringify(data) === JSON.stringify(data2));
 .as-console-wrapper{top:0;max-height:100%!important}

The encoding of the numbers. Ideally you would encode a number as binary with a static size, but this means 32bit/int which would be 6characters/number, so multibytes.

We split the number into chunks of 'n' bits, ignore the leading zeroes and encode the rest. ideally we can encode small number with very few characters, downside: we loose 1bit/chunk if n is too small and the average numbers are big. It's a tradeoff; that's why I left this configurable.

the current format is 6bits/number. 1 for the Structure, 5 bits as payload. In the format (1.....)*0.....

I would suggest you directly encode the Base13 string into Base64. Although that might not result in better compression than your solution, it removes the heavy multiplications you are performing. Moreover how do you guarantee that no collisions happen when converting through arrayToDecimal ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM