簡體   English   中英

Python:緊湊且可逆地將大整數編碼為具有可變或固定長度的 base64 或 base16

[英]Python: Compactly and reversibly encode large integer as base64 or base16 having variable or fixed length

我想將具有任意位數的大無符號或有符號整數緊湊地編碼為 base64、base32 或 base16(十六進制)表示。 輸出最終將用作將用作文件名的字符串,但這應該無關緊要。 我正在使用最新的 Python 3。

這有效,但遠非緊湊:

>>> import base64, sys
>>> i: int = 2**62 - 3  # Can be signed or unsigned.
>>> b64: bytes =  base64.b64encode(str(i).encode()) # Not a compact encoding.
>>> len(b64), sys.getsizeof(b64)
(28, 61)

有一個先前的問題,現已關閉,其答案嚴格涉及低效表示。 再次注意,我們不想在本練習中使用任何字符串或不必要的長字節序列。 因此,這個問題不是那個問題的重復。

這個答案的部分動機是 Erik A. 的不同評論,例如這個答案。 整數首先被緊湊地轉換為字節,然后字節被編碼為變量base

from typing import Callable, Optional
import base64

class IntBaseEncoder:
    """Reversibly encode an unsigned or signed integer into a customizable encoding of a variable or fixed length."""
    # Ref: https://stackoverflow.com/a/54152763/
    def __init__(self, encoding: str, *, bits: Optional[int] = None, signed: bool = False):
        """
        :param encoder: Name of encoding from base64 module, e.g. b64, urlsafe_b64, b32, b16, etc.
        :param bits: Max bit length of int which is to be encoded. If specified, the encoding is of a fixed length,
        otherwise of a variable length.
        :param signed: If True, integers are considered signed, otherwise unsigned.
        """
        self._decoder: Callable[[bytes], bytes] = getattr(base64, f'{encoding}decode')
        self._encoder: Callable[[bytes], bytes] = getattr(base64, f'{encoding}encode')
        self.signed: bool = signed
        self.bytes_length: Optional[int] = bits and self._bytes_length(2 ** bits - 1)

    def _bytes_length(self, i: int) -> int:
        return (i.bit_length() + 7 + self.signed) // 8

    def encode(self, i: int) -> bytes:
        length = self.bytes_length or self._bytes_length(i)
        i_bytes = i.to_bytes(length, byteorder='big', signed=self.signed)
        return self._encoder(i_bytes)

    def decode(self, b64: bytes) -> int:
        i_bytes = self._decoder(b64)
        return int.from_bytes(i_bytes, byteorder='big', signed=self.signed)

# Tests:
import unittest

class TestIntBaseEncoder(unittest.TestCase):

    ENCODINGS = ('b85', 'b64', 'urlsafe_b64', 'b32', 'b16')

    def test_unsigned_with_variable_length(self):
        for encoding in self.ENCODINGS:
            encoder = IntBaseEncoder(encoding)
            previous_length = 0
            for i in range(1234):
                encoded = encoder.encode(i)
                self.assertGreaterEqual(len(encoded), previous_length)
                self.assertEqual(i, encoder.decode(encoded))

    def test_signed_with_variable_length(self):
        for encoding in self.ENCODINGS:
            encoder = IntBaseEncoder(encoding, signed=True)
            previous_length = 0
            for i in range(-1234, 1234):
                encoded = encoder.encode(i)
                self.assertGreaterEqual(len(encoded), previous_length)
                self.assertEqual(i, encoder.decode(encoded))

    def test_unsigned_with_fixed_length(self):
        for encoding in self.ENCODINGS:
            for maxint in range(257):
                encoder = IntBaseEncoder(encoding, bits=maxint.bit_length())
                maxlen = len(encoder.encode(maxint))
                for i in range(maxint + 1):
                    encoded = encoder.encode(i)
                    self.assertEqual(len(encoded), maxlen)
                    self.assertEqual(i, encoder.decode(encoded))

    def test_signed_with_fixed_length(self):
        for encoding in self.ENCODINGS:
            for maxint in range(257):
                encoder = IntBaseEncoder(encoding, bits=maxint.bit_length(), signed=True)
                maxlen = len(encoder.encode(maxint))
                for i in range(-maxint, maxint + 1):
                    encoded = encoder.encode(i)
                    self.assertEqual(len(encoded), maxlen)
                    self.assertEqual(i, encoder.decode(encoded))

if __name__ == '__main__':
    unittest.main()

如果將輸出用作文件名,則使用編碼'urlsafe_b64'甚至'b16'初始化編碼器是更安全的選擇。

用法示例:

# Variable length encoding
>>> encoder = IntBaseEncoder('urlsafe_b64')
>>> encoder.encode(12345)
b'MDk='
>>> encoder.decode(_)
12345

# Fixed length encoding
>>> encoder = IntBaseEncoder('b16', bits=32)
>>> encoder.encode(12345)
b'00003039'
>>> encoder.encode(123456789)
b'075BCD15'
>>> encoder.decode(_)
123456789

# Signed
encoder = IntBaseEncoder('b32', signed=True)
encoder.encode(-12345)
b'Z7DQ===='
encoder.decode(_)
-12345

此答案中的以下片段應滿足您的需求,其優點是沒有依賴項:

def v2r(n, base): # value to representation
    """
    Convert a positive integer to its string representation in a custom base.
    
    :param n: the numeric value to be represented by the custom base
    :param base: the custom base defined as a string of characters, used as symbols of the base
    :returns: the string representation of natural number n in the custom base
    """
    if n == 0: return base[0]
    b = len(base)
    digits = ''
    while n > 0:
        digits = base[n % b] + digits
        n  = n // b
    return digits

它不直接執行典型的 base64 轉換(雖然它可以用來獲取它)但結果是相似的,因為它返回一個大整數的表示(只有正數,但你可以很容易地克服這種限制)在自定義 -由自定義符號組成的長度數字基數。

一些例子比任何單詞都更好地展示了它簡單而通用的用法:

# base64 filename-safe characters
# perform a base64 conversion if applied to multiples of 3-bytes chunks
>>> v2r(4276803,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_')
'QUJD'

# hexadecimal base
>>> v2r(123456789,'0123456789ABCDEF')
'75BCD15'
>>> v2r(255,'0123456789ABCDEF')
'FF'

# custom base of 62 filename-safe characters
>>> v2r(123456789,'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
'8m0Kx'

# custom base of 36 filename-safe lowercase characters for case insensitive file systems
>>> v2r(123456789,'0123456789abcdefghijklmnopqrstuvwxyz')
'21i3v9'

# binary conversion
>>> v2r(123456789,'01')
'111010110111100110100010101'
>>> v2r(255,'01')
'11111111'

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM