简体   繁体   English

Python - 从数字的位表示中去除尾随零的最快方法

[英]Python - Fastest way to strip the trailing zeros from the bit representation of a number

This is the python version of the same C++ question .这是同一个 C++ 问题的 python 版本。

Given a number, num , what is the fastest way to strip off the trailing zeros from its binary representation?给定一个数字num ,从其二进制表示形式中去除尾随零的最快方法是什么?

For example, let num = 232 .例如,让num = 232 We have bin(num) equal to 0b11101000 and we would like to strip the trailing zeros, which would produce 0b11101 .我们有bin(num)等于0b11101000并且我们想去除尾随零,这将产生0b11101 This can be done via string manipulation, but it'd probably be faster via bit manipulation.这可以通过字符串操作来完成,但通过位操作可能会更快。 So far, I have thought of something using num & -num到目前为止,我已经想到了使用num & -num的东西

Assuming num != 0 , num & -num produces the binary 0b1<trailing zeros> .假设num != 0num & -num产生二进制0b1<trailing zeros> For example,例如,

num   0b11101000
-num  0b00011000
&         0b1000

If we have a dict having powers of two as keys and the powers as values, we could use that to know by how much to right bit shift num in order to strip just the trailing zeros:如果我们有一个以 2 的幂为键,以 2 的幂为值的dict ,我们可以使用它来知道将num右移多少,以便仅去除尾随的零:

#        0b1     0b10     0b100     0b1000
POW2s = {  1: 0,    2: 1,     4: 2,      8: 3, ... }

def stripTrailingZeros(num):
  pow2 = num & -num
  pow_ = POW2s[pow2]  # equivalent to math.log2(pow2), but hopefully faster
  return num >> pow_

The use of dictionary POW2s trades space for speed - the alternative is to use math.log2(pow2) .使用字典POW2s以空间换取速度——另一种方法是使用math.log2(pow2)


Is there a faster way?有没有更快的方法?


Perhaps another useful tidbit is num ^ (num - 1) which produces 0b1!<trailing zeros> where !<trailing zeros> means take the trailing zeros and flip them into ones.也许另一个有用的花絮是num ^ (num - 1)产生0b1!<trailing zeros>其中!<trailing zeros>表示取尾随零并将它们翻转为 1。 For example,例如,

num    0b11101000
num-1  0b11100111
^          0b1111

Yet another alternative is to use a while loop另一种选择是使用 while 循环

def stripTrailingZeros_iterative(num):
  while num & 0b1 == 0:  # equivalent to `num % 2 == 0`
    num >>= 1
  return num

Ultimately, I need to execute this function on a big list of numbers.最终,我需要在一大串数字上执行这个 function。 Once I do that, I want the maximum.一旦我这样做了,我就想要最大的。 So if I have [64, 38, 22, 20] to begin with, I would have [1, 19, 11, 5] after performing the stripping.因此,如果我开始时有[64, 38, 22, 20] ,那么在执行剥离后我将有[1, 19, 11, 5] Then I would want the maximum of that, which is 19 .然后我想要其中的最大值,即19

There's really no answer to questions like this in the absence of specifying the expected distribution of inputs.在没有指定输入的预期分布的情况下,这样的问题真的没有答案。 For example, if all inputs are in range(256) , you can't beat a single indexed lookup into a precomputed list of the 256 possible cases.例如,如果所有输入都在range(256) ,则无法将单个索引查找打败到 256 种可能情况的预计算列表中。

If inputs can be two bytes, but you don't want to burn the space for 2**16 precomputed results, it's hard to beat (assuming that_table[i] gives the count of trailing zeroes in i ):如果输入可以是两个字节,但您不想为 2**16 预计算结果消耗空间,则很难被击败(假设that_table[i]给出i中尾随零的计数):

low = i & 0xff
result = that_table[low] if low else 8 + that_table[i >> 8]

And so on.等等。

You do not want to rely on log2() .不想依赖log2() The accuracy of that is entirely up to the C library on the platform CPython is compiled for.其准确性完全取决于编译 CPython 的平台上的 C 库。

What I actually use, in a context where ints can be up to hundreds of millions of bits:在 int 可以达到数亿位的情况下,我实际使用的是:

    assert d

    if d & 1 == 0:
        ntz = (d & -d).bit_length() - 1
        d >>= ntz

A while loop would be a disaster in this context, taking time quadratic in the number of bits shifted off.在这种情况下, while循环将是一场灾难,它花费的时间是移位位数的二次方。 Even one needless shift in that context would be a significant expense, which is why the code above first checks to see that at least one bit needs to shifted off.在这种情况下,即使是一次不必要的移位也将是一笔巨大的开支,这就是为什么上面的代码首先检查是否至少有一位需要移位。 But if ints "are much smaller", that check would probably cost more than it saves.但是,如果整数“小得多”,那么该检查的成本可能会超过它节省的成本。 "No answer in the absence of specifying the expected distribution of inputs." “在没有指定输入的预期分布的情况下没有答案。”

On my computer, a simple integer divide is fastest:在我的电脑上,一个简单的 integer 除法是最快的:

import timeit
timeit.timeit(setup='num=232', stmt='num // (num & -num)')
0.1088077999993402
timeit.timeit(setup='d = { 1: 0, 2 : 1, 4: 2, 8 : 3, 16 : 4, 32 : 5 }; num=232', stmt='num >> d[num & -num]')
0.13014470000052825
timeit.timeit(setup='import math; num=232', stmt='num >> int(math.log2(num & -num))')
0.2980690999993385

You say you "Ultimately, [..] execute this function on a big list of numbers to get odd numbers and find the maximum of said odd numbers."你说你“最终,[..] 在一个大的数字列表上执行这个 function 以获得奇数并找到所述奇数的最大值。”

So why not simply:那么为什么不简单地:

from random import randint


numbers = [randint(0, 10000) for _ in range(5000)]


odd_numbers = [n for n in numbers if n & 1]
max_odd = max(odd_numbers)
print(max_odd)

To do what you say you want to do ultimately, there seems to be little point in performing the "shift right until the result is odd" operation?最终要按照你说的去做,执行“右移直到结果为奇数”操作似乎没有什么意义? Unless you want the maximum of the result of that operation performed on all elements, which is not what you stated?除非你想要对所有元素执行该操作的结果的最大值,这不是你所说的?

I agree with @TimPeters answer, but if you put Python through its paces and actually generate some data sets and try the various solutions proposed, they maintain their spread for any number of integer size when using Python int s, so your best option is integer division for numbers up to 32-bits, after that see the chart below:我同意@TimPeters 的回答,但是如果你把 Python 放在它的步伐中并实际生成一些数据集并尝试提出的各种解决方案,他们会在使用 Python int s 时保持任意数量的 integer 大小的传播,所以你最好的选择是 integer最多 32 位数字的除法,之后见下表:

from pandas import DataFrame
from timeit import timeit
import math
from random import randint


def reduce0(ns):
    return [n // (n & -n)
            for n in ns]


def reduce1(ns, d):
    return [n >> d[n & -n]
            for n in ns]


def reduce2(ns):
    return [n >> int(math.log2(n & -n))
            for n in ns]


def reduce3(ns, t):
    return [n >> t.index(n & -n)
            for n in ns]


def reduce4(ns):
    return [n if n & 1 else n >> ((n & -n).bit_length() - 1)
            for n in ns]


def single5(n):
    while (n & 0xffffffff) == 0:
        n >>= 32
    if (n & 0xffff) == 0:
        n >>= 16
    if (n & 0xff) == 0:
        n >>= 8
    if (n & 0xf) == 0:
        n >>= 4
    if (n & 0x3) == 0:
        n >>= 2
    if (n & 0x1) == 0:
        n >>= 1
    return n


def reduce5(ns):
    return [single5(n)
            for n in ns]


numbers = [randint(1, 2 ** 16 - 1) for _ in range(5000)]
d = {2 ** n: n for n in range(16)}
t = tuple(2 ** n for n in range(16))
assert(reduce0(numbers) == reduce1(numbers, d) == reduce2(numbers) == reduce3(numbers, t) == reduce4(numbers) == reduce5(numbers))

df = DataFrame([{}, {}, {}, {}, {}, {}])
for p in range(1, 16):
    p = 2 ** p
    numbers = [randint(1, 2 ** p - 1) for _ in range(4096)]

    d = {2**n: n for n in range(p)}
    t = tuple(2 ** n for n in range(p))

    df[p] = [
        timeit(lambda: reduce0(numbers), number=100),
        timeit(lambda: reduce1(numbers, d), number=100),
        timeit(lambda: reduce2(numbers), number=100),
        timeit(lambda: reduce3(numbers, t), number=100),
        timeit(lambda: reduce4(numbers), number=100),
        timeit(lambda: reduce5(numbers), number=100)
    ]
    print(f'Complete for {p} bit numbers.')


print(df)
df.to_csv('test_results.csv')

Result (when plotted in Excel):结果(在 Excel 中绘制时): 本地机器结果(更新)

Note that the plot that was previously here was wrong.注意这里之前的plot是错误的。 The code and data were not though, The code has been updated to include @MarkRansom's solution.虽然没有代码和数据,但代码已更新为包含@MarkRansom 的解决方案。 since it turns out to be the optimal solution for very large numbers (over 4k-bit numbers).因为事实证明它是非常大的数字(超过 4k 位数字)的最佳解决方案。

while (num & 0xffffffff) == 0:
    num >>= 32
if (num & 0xffff) == 0:
    num >>= 16
if (num & 0xff) == 0:
    num >>= 8
if (num & 0xf) == 0:
    num >>= 4
if (num & 0x3) == 0:
    num >>= 2
if (num & 0x1) == 0:
    num >>= 1

The idea here is to perform as few shifts as possible.这里的想法是执行尽可能少的班次。 The initial while loop handles numbers that are over 32 bits long, which I consider unlikely but it has to be provided for completeness.最初的while循环处理超过 32 位长的数字,我认为这不太可能,但为了完整性必须提供它。 After that each statement shifts half as many bits;之后每条语句移动一半的位数; if you can't shift by 16, then the most you could shift is 15 which is (8+4+2+1).如果你不能移动 16,那么你最多可以移动 15,即 (8+4+2+1)。 All possible cases are covered by those 5 if statements.这 5 个if语句涵盖了所有可能的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM