简体   繁体   中英

Is there a faster way of converting a number to a name?

The following code defines a sequence of names that are mapped to numbers. It is designed to take a number and retrieve a specific name. The class operates by ensuring the name exists in its cache, and then returns the name by indexing into its cache. The question in this: how can the name be calculated based on the number without storing a cache?

The name can be thought of as a base 63 number, except for the first digit which is always in base 53.

class NumberToName:

    def __generate_name():
        def generate_tail(length):
            if length > 0:
                for char in NumberToName.CHARS:
                    for extension in generate_tail(length - 1):
                        yield char + extension
            else:
                yield ''
        for length in itertools.count():
            for char in NumberToName.FIRST:
                for extension in generate_tail(length):
                    yield char + extension

    FIRST = ''.join(sorted(string.ascii_letters + '_'))
    CHARS = ''.join(sorted(string.digits + FIRST))
    CACHE = []
    NAMES = __generate_name()

    @classmethod
    def convert(cls, number):
        for _ in range(number - len(cls.CACHE) + 1):
            cls.CACHE.append(next(cls.NAMES))
        return cls.CACHE[number]

    def __init__(self, *args, **kwargs):
        raise NotImplementedError()

The following interactive sessions show some of the values that are expected to be returned in order.

>>> NumberToName.convert(0)
'A'
>>> NumberToName.convert(26)
'_'
>>> NumberToName.convert(52)
'z'
>>> NumberToName.convert(53)
'A0'
>>> NumberToName.convert(1692)
'_1'
>>> NumberToName.convert(23893)
'FAQ'

Unfortunately, these numbers need to be mapped to these exact names (to allow a reverse conversion).


Please note: A variable number of bits are received and converted unambiguously into a number. This number should be converted unambiguously to a name in the Python identifier namespace. Eventually, valid Python names will be converted to numbers, and these numbers will be converted to a variable number of bits.


Final solution:

import string

HEAD_CHAR = ''.join(sorted(string.ascii_letters + '_'))
TAIL_CHAR = ''.join(sorted(string.digits + HEAD_CHAR))
HEAD_BASE, TAIL_BASE = len(HEAD_CHAR), len(TAIL_CHAR)

def convert_number_to_name(number):
    if number < HEAD_BASE: return HEAD_CHAR[number]
    q, r = divmod(number - HEAD_BASE, TAIL_BASE)
    return convert_number_to_name(q) + TAIL_CHAR[r]

This is a fun little problem full of off by 1 errors.

Without loops:

import string

first_digits = sorted(string.ascii_letters + '_')
rest_digits = sorted(string.digits + string.ascii_letters + '_')

def convert(number):
    if number < len(first_digits):
        return first_digits[number]

    current_base = len(rest_digits)
    remain = number - len(first_digits)
    return convert(remain / current_base) + rest_digits[remain % current_base]

And the tests:

print convert(0)
print convert(26)
print convert(52)
print convert(53)
print convert(1692)
print convert(23893)

Output:

A
_
z
A0
_1
FAQ

What you've got is a corrupted form of bijective numeration (the usual example being spreadsheet column names, which are bijective base-26).

One way to generate bijective numeration:

def bijective(n, digits=string.ascii_uppercase):
    result = []
    while n > 0:
        n, mod = divmod(n - 1, len(digits))
        result += digits[mod]
    return ''.join(reversed(result))

All you need to do is supply a different set of digits for the case where 53 >= n > 0 . You will also need to increment n by 1, as properly the bijective 0 is the empty string, not "A" :

def name(n, first=sorted(string.ascii_letters + '_'), digits=sorted(string.ascii_letters + '_' + string.digits)):
    result = []
    while n >= len(first):
        n, mod = divmod(n - len(first), len(digits))
        result += digits[mod]
    result += first[n]
    return ''.join(reversed(result))

Tested for the first 10,000 names:

first_chars = sorted(string.ascii_letters + '_')
later_chars = sorted(list(string.digits) + first_chars)

def f(n):
    # first, determine length by subtracting the number of items of length l
    # also determines the index into the list of names of length l
    ix = n
    l = 1
    while ix >= 53 * (63 ** (l-1)):
        ix -= 53 * (63 ** (l-1))
        l += 1

    # determine first character
    first = first_chars[ix // (63 ** (l-1))]

    # rest of string is just a base 63 number
    s = ''
    rem = ix % (63 ** (l-1))
    for i in range(l-1):
        s = later_chars[rem % 63] + s
        rem //= 63

    return first+s

You can use the code in this answer to the question "Base 62 conversion in Python" (or perhaps one of the other answers).

Using the referenced code, I think the answer your real question which was " how can the name be calculated based on the number without storing a cache? " would be to make the name the simple base 62 conversion of the number possibly with a leading underscore if the first character of the name is a digit (which is simply ignored when converting the name back into a number).

Here's sample code illustrating what I propose:

from base62 import base62_encode, base62_decode

def NumberToName(num):
    ret = base62_encode(num)
    return ('_' + ret) if ret[0] in '0123456789' else ret

def NameToNumber(name):
    return base62_decode(name if name[0] is not '_' else name[1:])

if __name__ == '__main__':
    def test(num):
        name = NumberToName(num)
        num2 = NameToNumber(name)
        print 'NumberToName({0:5d}) -> {1!r:>6s}, NameToNumber({2!r:>6s}) -> {3:5d}' \
              .format(num, name, name, num2)

    test(26)
    test(52)
    test(53)
    test(1692)
    test(23893)

Output:

NumberToName(   26) ->    'q', NameToNumber(   'q') ->    26
NumberToName(   52) ->    'Q', NameToNumber(   'Q') ->    52
NumberToName(   53) ->    'R', NameToNumber(   'R') ->    53
NumberToName( 1692) ->   'ri', NameToNumber(  'ri') ->  1692
NumberToName(23893) -> '_6dn', NameToNumber('_6dn') -> 23893

If the numbers could be negative, you might have to modify the code from the referenced answer (and there is some discussion there on how to do it).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM