简体   繁体   中英

How to encode a long in Base64 in Python?

In Java, I can encode a BigInteger as:

java.math.BigInteger bi = new java.math.BigInteger("65537L");
String encoded = Base64.encodeBytes(bi.toByteArray(), Base64.ENCODE|Base64.DONT_GUNZIP);

// result: 65537L encodes as "AQAB" in Base64

byte[] decoded = Base64.decode(encoded, Base64.DECODE|Base64.DONT_GUNZIP);
java.math.BigInteger back = new java.math.BigInteger(decoded);

In C#:

System.Numerics.BigInteger bi = new System.Numerics.BigInteger("65537L");
string encoded = Convert.ToBase64(bi);
byte[] decoded = Convert.FromBase64String(encoded);
System.Numerics.BigInteger back = new System.Numerics.BigInteger(decoded);

How can I encode long integers in Python as Base64-encoded strings? What I've tried so far produces results different from implementations in other languages (so far I've tried in Java and C#), particularly it produces longer-length Base64-encoded strings.

import struct
encoded = struct.pack('I', (1<<16)+1).encode('base64')[:-1]
# produces a longer string, 'AQABAA==' instead of the expected 'AQAB'

When using this Python code to produce a Base64-encoded string, the resulting decoded integer in Java (for example) produces instead 16777472 in place of the expected 65537 . Firstly, what am I missing?

Secondly, I have to figure out by hand what is the length format to use in struct.pack ; and if I'm trying to encode a long number (greater than (1<<64)-1 ) the 'Q' format specification is too short to hold the representation. Does that mean that I have to do the representation by hand, or is there an undocumented format specifier for the struct.pack function? (I'm not compelled to use struct , but at first glance it seemed to do what I needed.)

Check out this page on converting integer to base64 .

import base64
import struct

def encode(n):
    data = struct.pack('<Q', n).rstrip('\x00')
    if len(data)==0:
        data = '\x00'
    s = base64.urlsafe_b64encode(data).rstrip('=')
    return s

def decode(s):
    data = base64.urlsafe_b64decode(s + '==')
    n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
    return n[0]

The struct module :

… performs conversions between Python values and C structs represented as Python strings.

Because C doesn't have infinite-length integers, there's no functionality for packing them.

But it's very easy to write yourself. For example:

def pack_bigint(i):
    b = bytearray()
    while i:
        b.append(i & 0xFF)
        i >>= 8
    return b

Or:

def pack_bigint(i):
    bl = (i.bit_length() + 7) // 8
    fmt = '<{}B'.format(bl)
    # ...

And so on.

And of course you'll want an unpack function, like jbatista's from the comments:

def unpack_bigint(b):
    b = bytearray(b) # in case you're passing in a bytes/str
    return sum((1 << (bi*8)) * bb for (bi, bb) in enumerate(b))

This is a bit late, but I figured I'd throw my hat in the ring:

def inttob64(n):                                                              
    """                                                                       
    Given an integer returns the base64 encoded version of it (no trailing ==)
    """
    parts = []                                                                
    while n:                                                                  
        parts.insert(0,n & limit)                                             
        n >>= 32                                                              
    data = struct.pack('>' + 'L'*len(parts),*parts)                           
    s = base64.urlsafe_b64encode(data).rstrip('=')                            
    return s                                                                  

def b64toint(s):                                                              
    """                                                                       
    Given a string with a base64 encoded value, return the integer representation
    of it                                                                     
    """                                                                       
    data = base64.urlsafe_b64decode(s + '==')                                 
    n = 0                                                                     
    while data:                                                               
        n <<= 32                                                              
        (toor,) = struct.unpack('>L',data[:4])                                
        n |= toor & 0xffffffff                                                
        data = data[4:]                                                       
    return n

These functions turn an arbitrary-sized long number to/from a big-endian base64 representation.

Here is something that may help. Instead of using struct.pack() I am building a string of bytes to encode and then calling the BASE64 encode on that. I didn't write the decode, but clearly the decode can recover an identical string of bytes and a loop could recover the original value. I don't know if you need fixed-size integers (like always 128-bit) and I don't know if you need Big Endian so I left the decoder for you.

Also, encode64() and decode64() are from @msc's answer, but modified to work.

import base64
import struct

def encode64(n):
  data = struct.pack('<Q', n).rstrip('\x00')
  if len(data)==0:
    data = '\x00'
  s = base64.urlsafe_b64encode(data).rstrip('=')
  return s

def decode64(s):
  data = base64.urlsafe_b64decode(s + '==')
  n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
  return n[0]

def encode(n, big_endian=False):
    lst = []
    while True:
        n, lsb = divmod(n, 0x100)
        lst.append(chr(lsb))
        if not n:
            break
    if big_endian:
        # I have not tested Big Endian mode, and it may need to have
        # some initial zero bytes prepended; like, if the integer is
        # supposed to be a 128-bit integer, and you encode a 1, you
        # would need this to have 15 leading zero bytes.
        initial_zero_bytes = '\x00' * 2
        data = initial_zero_bytes + ''.join(reversed(lst))
    else:
        data = ''.join(lst)
    s = base64.urlsafe_b64encode(data).rstrip('=')
    return s

print encode(1234567890098765432112345678900987654321)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM