Python：將字母數字字符串可逆地編碼為整數

Question

我想將字符串（由字母數字字符組成）轉換為整數，然后將此整數轉換回字符串：

string --> int --> string

換句話說，我想用整數表示一個字母數字字符串。

我找到了一個可行的解決方案，我將其包含在答案中，但我認為這不是最佳解決方案，而且我對其他想法/方法感興趣。

請不要僅僅因為已經存在很多類似的問題而將其標記為重復，我特別想要一種將字符串轉換為整數的簡單方法，反之亦然。

這應該適用於包含字母數字字符的字符串，即包含數字和字母的字符串。

Answer 1

這是我到目前為止所擁有的：

字符串 --> 字節

mBytes = m.encode("utf-8")

字節 --> 整數

mInt = int.from_bytes(mBytes, byteorder="big")

int --> 字節

mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

字節 --> 字符串

m = mBytes.decode("utf-8")

試試看：

m = "test123"
mBytes = m.encode("utf-8")
mInt = int.from_bytes(mBytes, byteorder="big")
mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
m2 = mBytes2.decode("utf-8")
print(m == m2)

這是上述內容的相同可重用版本：

class BytesIntEncoder:

    @staticmethod
    def encode(b: bytes) -> int:
        return int.from_bytes(b, byteorder='big')

    @staticmethod
    def decode(i: int) -> bytes:
        return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')

如果您使用的是 Python <3.6，請刪除可選的類型注釋。

測試：

>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'

>>> BytesIntEncoder.encode(b)
23755444588720691
>>> BytesIntEncoder.decode(_)
b'Test123'
>>> _.decode()
'Test123'

Answer 2

回想一下，字符串可以編碼為字節，然后可以編碼為整數。 然后可以反轉編碼以獲取字節后跟原始字符串。

此編碼器使用binascii生成與charel-f 的答案中相同的整數編碼。 我相信它是相同的，因為我對其進行了廣泛的測試。

信用：這個答案。

from binascii import hexlify, unhexlify

class BytesIntEncoder:

    @staticmethod
    def encode(b: bytes) -> int:
        return int(hexlify(b), 16) if b != b'' else 0

    @staticmethod
    def decode(i: int) -> int:
        return unhexlify('%x' % i) if i != 0 else b''

如果您使用的是 Python <3.6，請刪除可選的類型注釋。

快速測試：

>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'

>>> BytesIntEncoder.encode(b)
23755444588720691
>>> BytesIntEncoder.decode(_)
b'Test123'
>>> _.decode()
'Test123'

Answer 3

假設字符集只是字母數字，即 az AZ 0-9，這需要每個字符 6 位。 因此，使用 8 位字節編碼在理論上是對內存的低效使用。

此答案將輸入字節轉換為 6 位整數序列。 它使用按位運算將這些小整數編碼為一個大整數。 這是否真的轉化為現實世界的存儲效率由sys.getsizeof ，並且更有可能用於更大的字符串。

此實現自定義了字符集選擇的編碼。 例如，如果您只使用string.ascii_lowercase （5 位）而不是string.ascii_uppercase + string.digits （6 位），則編碼將相應地高效。

單元測試也包括在內。

import string


class BytesIntEncoder:

    def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
        num_chars = len(chars)
        translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
        self._translation_table = bytes.maketrans(chars, translation)
        self._reverse_translation_table = bytes.maketrans(translation, chars)
        self._num_bits_per_char = (num_chars + 1).bit_length()

    def encode(self, chars: bytes) -> int:
        num_bits_per_char = self._num_bits_per_char
        output, bit_idx = 0, 0
        for chr_idx in chars.translate(self._translation_table):
            output |= (chr_idx << bit_idx)
            bit_idx += num_bits_per_char
        return output

    def decode(self, i: int) -> bytes:
        maxint = (2 ** self._num_bits_per_char) - 1
        output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
        return output.translate(self._reverse_translation_table)


# Test
import itertools
import random
import unittest


class TestBytesIntEncoder(unittest.TestCase):

    chars = string.ascii_letters + string.digits
    encoder = BytesIntEncoder(chars.encode())

    def _test_encoding(self, b_in: bytes):
        i = self.encoder.encode(b_in)
        self.assertIsInstance(i, int)
        b_out = self.encoder.decode(i)
        self.assertIsInstance(b_out, bytes)
        self.assertEqual(b_in, b_out)
        # print(b_in, i)

    def test_thoroughly_with_small_str(self):
        for s_len in range(4):
            for s in itertools.combinations_with_replacement(self.chars, s_len):
                s = ''.join(s)
                b_in = s.encode()
                self._test_encoding(b_in)

    def test_randomly_with_large_str(self):
        for s_len in range(256):
            num_samples = {s_len <= 16: 2 ** s_len,
                           16 < s_len <= 32: s_len ** 2,
                           s_len > 32: s_len * 2,
                           s_len > 64: s_len,
                           s_len > 128: 2}[True]
            # print(s_len, num_samples)
            for _ in range(num_samples):
                b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
                self._test_encoding(b_in)


if __name__ == '__main__':
    unittest.main()

用法示例：

>>> encoder = BytesIntEncoder()
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'

>>> encoder.encode(b)
3908257788270
>>> encoder.decode(_)
b'Test123'

Answer 4

所以我需要在數字方面傳輸字典，它可能看起來有點難看，但它的效率很高，因為每個字符（英文字母）正好是 2 個數字，但它能夠傳輸任何類型的 unicode 字符

import json

myDict = {
    "le key": "le Valueue",
    2 : {
        "heya": 1234569,
        "3": 4
    },
    'Α α, Β β, Γ γ' : 'שלום'
}
def convertDictToNum(toBeConverted):
    return int(''.join([(lambda c: c if len(c) ==2 else '0'+c )(str(ord(c) - 26)) for c in str(json.dumps(toBeConverted))]))

def loadDictFromNum(toBeDecoded):
    toBeDecoded = str(toBeDecoded)
    return json.loads(''.join([chr(int(toBeDecoded[cut:cut + 2]) + 26) for cut in range(0, len(toBeDecoded), 2)]))

numbersDict = convertDictToNum(myDict)
print(numbersDict)
# 9708827506817595083206088....
recoveredDict = loadDictFromNum(numbersDict)
print(recoveredDict)
# {'le key': 'le Valueue', '2': {'heya': 1234569, '3': 4}, 'Α α, Β β, Γ γ': 'שלום'}

Answer 5

我想做同樣的事情並想出了我自己的算法。 您可以決定它是否值得使用。 對於提供的相同輸入，您將始終獲得相同的 int。 如果您提供具有相同字符但順序不同的字符串，您將得到不同的結果。

import hashlib

string="example"
str_as_sha1hash = hashlib.sha1(string.encode()).hexdigest()
result = 0
for idx, char in enumerate(str_as_sha1hash):
    result = result + ord(char) * (idx + 1)
print(result)

對於單詞“ example ”，您應該得到61071 。 如果您嘗試使用“示例”，您應該會收到55095 。 如果您認為您需要比 sha-1 更強的散列算法，您可以將其替換為hashlib庫中可用的任何內容。 最后，如果你需要 str 而不是 int 你當然可以做 str(result)。

Python：將字母數字字符串可逆地編碼為整數

問題描述

4 個解決方案

解決方案1
5 已采納 2018-11-21 21:28:42

解決方案2
4 2019-02-03 07:26:01

解決方案3
2 2019-02-03 07:48:51

解決方案4
1 2020-02-17 18:46:37

解決方案5
0 2021-11-11 17:46:36

Python：將字母數字字符串可逆地編碼為整數

問題描述

4 個解決方案

解決方案1 5 已采納 2018-11-21 21:28:42

解決方案2 4 2019-02-03 07:26:01

解決方案3 2 2019-02-03 07:48:51

解決方案4 1 2020-02-17 18:46:37

解決方案5 0 2021-11-11 17:46:36

解決方案1
5 已采納 2018-11-21 21:28:42

解決方案2
4 2019-02-03 07:26:01

解決方案3
2 2019-02-03 07:48:51

解決方案4
1 2020-02-17 18:46:37

解決方案5
0 2021-11-11 17:46:36