简体   繁体   English

有效地计算组合和排列

[英]counting combinations and permutations efficiently

I have some code to count permutations and combinations, and I'm trying to make it work better for large numbers.我有一些代码来计算排列和组合,并且我正在努力让它更好地适用于大量数字。

I've found a better algorithm for permutations that avoids large intermediate results, but I still think I can do better for combinations.我找到了一种更好的排列算法,可以避免大的中间结果,但我仍然认为我可以在组合方面做得更好。

So far, I've put in a special case to reflect the symmetry of nCr, but I'd still like to find a better algorithm that avoids the call to factorial(r), which is an unnecessarily large intermediate result.到目前为止,我已经放入了一个特殊情况来反映 nCr 的对称性,但我仍然想找到一个更好的算法来避免调用 factorial(r),这是一个不必要的大中间结果。 Without this optimization, the last doctest takes too long trying to calculate factorial(99000).如果没有这种优化,最后一个 doctest 会花费很长时间来计算阶乘(99000)。

Can anyone suggest a more efficient way to count combinations?任何人都可以提出一种更有效的计算组合的方法吗?

from math import factorial

def product(iterable):
    prod = 1
    for n in iterable:
        prod *= n
    return prod

def npr(n, r):
    """
    Calculate the number of ordered permutations of r items taken from a
    population of size n.

    >>> npr(3, 2)
    6
    >>> npr(100, 20)
    1303995018204712451095685346159820800000
    """
    assert 0 <= r <= n
    return product(range(n - r + 1, n + 1))

def ncr(n, r):
    """
    Calculate the number of unordered combinations of r items taken from a
    population of size n.

    >>> ncr(3, 2)
    3
    >>> ncr(100, 20)
    535983370403809682970
    >>> ncr(100000, 1000) == ncr(100000, 99000)
    True
    """
    assert 0 <= r <= n
    if r > n // 2:
        r = n - r
    return npr(n, r) // factorial(r)

if n is not far from r then using the recursive definition of combination is probably better, since xC0 == 1 you will only have a few iterations:如果 n 离 r 不远,那么使用组合的递归定义可能会更好,因为 xC0 == 1 你只会有几次迭代:

The relevant recursive definition here is:这里相关的递归定义是:

nCr = (n-1)C(r-1) * n/r nCr = (n-1)C(r-1) * n/r

This can be nicely computed using tail recursion with the following list:这可以使用带有以下列表的尾递归很好地计算:

[(n - r, 0), (n - r + 1, 1), (n - r + 2, 2), ..., (n - 1, r - 1), (n, r)] [(n - r, 0), (n - r + 1, 1), (n - r + 2, 2), ..., (n - 1, r - 1), (n, r)]

which is of course easily generated in Python (we omit the first entry since nC0 = 1) by izip(xrange(n - r + 1, n+1), xrange(1, r+1)) Note that this assumes r <= n you need to check for that and swap them if they are not.这当然很容易在 Python 中通过izip(xrange(n - r + 1, n+1), xrange(1, r+1))在 Python 中生成(我们省略了第一个条目,因为 nC0 = 1)请注意,这假设 r < = n 您需要检查并交换它们,如果不是。 Also to optimize use if r < n/2 then r = n - r.如果 r < n/2 则 r = n - r 也可以优化使用。

Now we simply need to apply the recursion step using tail recursion with reduce.现在我们只需要使用带有reduce的尾递归来应用递归步骤。 We start with 1 since nC0 is 1 and then multiply the current value with the next entry from the list as below.我们从 1 开始,因为 nC0 为 1,然后将当前值与列表中的下一个条目相乘,如下所示。

from itertools import izip

reduce(lambda x, y: x * y[0] / y[1], izip(xrange(n - r + 1, n+1), xrange(1, r+1)), 1)

Two fairly simple suggestions:两个相当简单的建议:

  1. To avoid overflow, do everything in log space.为避免溢出,请在日志空间中执行所有操作。 Use the fact that log(a * b) = log(a) + log(b), and log(a / b) = log(a) - log(b).使用 log(a * b) = log(a) + log(b) 和 log(a / b) = log(a) - log(b) 的事实。 This makes it easy to work with very large factorials: log(n! / m!) = log(n!) - log(m!), etc.这使得处理非常大的阶乘变得容易:log(n! / m!) = log(n!) - log(m!) 等。

  2. Use the gamma function instead of factorial.使用 gamma 函数而不是阶乘。 You can find one in scipy.stats.loggamma .您可以在scipy.stats.loggamma中找到一个。 It's a much more efficient way to calculate log-factorials than direct summation.这是一种比直接求和更有效的计算对数阶乘的方法。 loggamma(n) == log(factorial(n - 1)) , and similarly, gamma(n) == factorial(n - 1) . loggamma(n) == log(factorial(n - 1)) ,类似地, gamma(n) == factorial(n - 1)

There's a function for this in scipy which hasn't been mentioned yet: scipy.special.comb .在 scipy 中有一个尚未提及的功能: scipy.special.comb It seems efficient based on some quick timing results for your doctest (~0.004 seconds for comb(100000, 1000, 1) == comb(100000, 99000, 1) ).根据您的 doctest 的一些快速计时结果,它似乎很有效( comb(100000, 1000, 1) == comb(100000, 99000, 1)约为 0.004 秒)。

[While this specific question seems to be about algorithms the question is there a math ncr function in python is marked as a duplicate of this...] [虽然这个特定问题似乎与算法有关,但问题是 python 中的数学 ncr 函数被标记为与此重复......]

如果您不需要纯 python 解决方案, gmpy2可能会有所帮助( gmpy2.comb非常快)。

For Python until 3.7:对于 3.7 之前的 Python:

def prod(items, start=1):
    for item in items:
        start *= item
    return start


def perm(n, k):
    if not 0 <= k <= n:
        raise ValueError(
            'Values must be non-negative and n >= k in perm(n, k)')
    else:
        return prod(range(n - k + 1, n + 1))


def comb(n, k):
    if not 0 <= k <= n:
        raise ValueError(
            'Values must be non-negative and n >= k in comb(n, k)')
    else:
        k = k if k < n - k else n - k
        return prod(range(n - k + 1, n + 1)) // math.factorial(k)

For Python 3.8+:对于 Python 3.8+:


Interestingly enough, some manual implementation of the combination function may be faster than math.comb() :有趣的是,组合函数的一些手动实现可能比math.comb()更快:

def math_comb(n, k):
    return math.comb(n, k)


def comb_perm(n, k):
    k = k if k < n - k else n - k
    return math.perm(n, k) // math.factorial(k)


def comb(n, k):
    k = k if k < n - k else n - k
    return prod(range(n - k + 1, n + 1)) // math.factorial(k)


def comb_other(n, k):
    k = k if k > n - k else n - k
    return prod(range(n - k + 1, n + 1)) // math.factorial(k)


def comb_reduce(n, k):
    k = k if k < n - k else n - k
    return functools.reduce(
        lambda x, y: x * y[0] // y[1],
        zip(range(n - k + 1, n + 1), range(1, k + 1)),
        1)


def comb_iter(n, k):
    k = k if k < n - k else n - k
    result = 1
    for i in range(1, k + 1):
        result = result * (n - i + 1) // i
    return result


def comb_iterdiv(n, k):
    k = k if k < n - k else n - k
    result = divider = 1
    for i in range(1, k + 1):
        result *= (n - i + 1)
        divider *= i
    return result // divider


def comb_fact(n, k):
    k = k if k < n - k else n - k
    return math.factorial(n) // math.factorial(n - k) // math.factorial(k)

BM

so that actually comb_perm() (implemented with math.perm() and math.factorial() ) is actually faster than math.comb() most of the times for these benchamarks, which show the computation time for fixed n=256 and increasing k (up until k = n // 2 ).所以实际上comb_perm() (用math.perm()math.factorial()实现)在大多数情况下实际上比math.comb()更快,这些基准显示了固定n=256和增加的计算时间k (直到k = n // 2 )。

Note that comb_reduce() , which is quite slow, is essentially the same approach as from @wich's answer , while comb_iter() , also relatively slow, is essentially the same approach as @ZXX's answer .请注意, comb_reduce()非常慢,与@wich 的answer基本相同,而comb_iter()也相对较慢,与@ZXX 的 answer基本相同。

Partial analysis here (without comb_math() and comb_perm() since they are not supported in Python's version of Colab -- 3.7 -- as of last edit). 此处进行部分分析(没有comb_math()comb_perm() ,因为它们在 Python 的 Colab 版本 - 3.7 - 截至上次编辑时不受支持)。

from scipy import misc
misc.comb(n, k)

should allow you to count combinations应该允许您计算组合

More efficient solution for nCr - space wise and precision wise.更有效的 nCr 解决方案 - 空间方面和精度方面。

The intermediary (res) is guaranteed to always be int and never larger than the result.中介 (res) 保证始终是 int 并且永远不会大于结果。 Space complexity is O(1) (no lists, no zips, no stack), time complexity is O(r) - exactly r multiplications and r divisions.空间复杂度为 O(1)(无列表、无拉链、无堆栈),时间复杂度为 O(r) - 恰好是 r 次乘法和 r 次除法。

def ncr(n, r):
    r = min(r, n-r)
    if r == 0: return 1
    res = 1
    for k in range(1,r+1):
        res = res*(n-k+1)/k
    return res

If you are computing N choose K (which is what I think you're doing with ncr), there is a dynamic programming solution that may be a lot faster.如果您正在计算 N 选择 K(我认为您正在使用 ncr 执行此操作),那么有一个动态编程解决方案可能会快得多。 This will avoid factorial, plus you can keep the table if you want for later use.这将避免阶乘,而且如果您想以后使用,您可以保留该表。

Here is a teaching link for it:这是它的教学链接:

http://www.csc.liv.ac.uk/~ped/teachadmin/algor/dyprog.html http://www.csc.liv.ac.uk/~ped/teachadmin/algor/dyprog.html

I am unsure of how to better solve your first problem, though, sorry.不过,我不确定如何更好地解决您的第一个问题,抱歉。

Edit: Here is the mock-up.编辑:这是模型。 There are some pretty hilarious off-by-one errors, so it can certainly stand some more clean up.有一些非常搞笑的错误,所以它当然可以经受更多的清理。

import sys
n = int(sys.argv[1])+2#100
k = int(sys.argv[2])+1#20
table = [[0]*(n+2)]*(n+2)

for i in range(1,n):
    table[i][i] = 1
for i in range(1,n):
    for j in range(1,n-i):
        x = i+j
        if j == 1: table[x][j] = 1
        else: table[x][j] = table[x-1][j-1] + table[x-1][j]

print table[n][k]

If your problem does not require knowing the exact number of permutations or combinations, then you could use Stirling's approximation for the factorial.如果您的问题不需要知道排列或组合的确切数量,那么您可以使用斯特林的阶乘近似值

That would lead to code like this:这将导致如下代码:

import math

def stirling(n):
    # http://en.wikipedia.org/wiki/Stirling%27s_approximation
    return math.sqrt(2*math.pi*n)*(n/math.e)**n

def npr(n,r):
    return (stirling(n)/stirling(n-r) if n>20 else
            math.factorial(n)/math.factorial(n-r))

def ncr(n,r):    
    return (stirling(n)/stirling(r)/stirling(n-r) if n>20 else
            math.factorial(n)/math.factorial(r)/math.factorial(n-r))

print(npr(3,2))
# 6
print(npr(100,20))
# 1.30426670868e+39
print(ncr(3,2))
# 3
print(ncr(100,20))
# 5.38333246453e+20
from numpy import prod

def nCr(n,r):
    numerator = range(n, max(n-r,r),-1)
    denominator = range(1, min(n-r,r) +1,1)
    return int(prod(numerator)/prod(denominator))

You could input two integers and import math library to find the factorial and then apply the nCr formula您可以输入两个整数并导入数学库以找到阶乘,然后应用 nCr 公式

import math
n,r=[int(_)for _ in raw_input().split()]
f=math.factorial
print f(n)/f(r)/f(n-r)

Using xrange() instead of range() will speed things up slightly due to the fact that no intermediate list is created, populated, iterated through, and then destroyed.使用xrange()代替range()会稍微加快速度,因为没有创建、填充、迭代和销毁中间列表。 Also, reduce() with operator.mul .此外, reduce()operator.mul

For N choose K you could use Pascals triangle.对于 N 选择 K,您可以使用帕斯卡三角形。 Basically you would need to keep array of size N around to compute all the N choose K values.基本上,您需要保留大小为 N 的数组来计算所有 N 选择 K 值。 Only additions would be required.只需要添加。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM