简体   繁体   English

计算 n 的最佳方法选择 k?

[英]Best way of calculating n choose k?

What is the most efficient method to evaluate the value of " n choose k "?评估“ n choose k ”值的最有效方法是什么? The brute force way I think would be to find n! / k! / (nk)!我认为的蛮力方式是找到n! / k! / (nk)! n! / k! / (nk)! by calculating each factorial separately.通过分别计算每个阶乘。

A better strategy may be to use DP according to this recursive formula , nCk == (n-1)C(k-1) + (n-1)C(k) .更好的策略可能是根据这个递归公式使用 DP, nCk == (n-1)C(k-1) + (n-1)C(k) Is there any other better method to evaluate n choose k in terms of complexity and avoiding risk of overflow?有没有其他更好的方法来评估n choose k的复杂性和避免溢出的风险?

Here is my version, which works purely in integers (the division by k always produces an integer quotient) and is fast at O(k):这是我的版本,它纯粹以整数工作(除以 k 总是产生整数商)并且在 O(k) 时速度很快:

function choose(n, k)
    if k == 0 return 1
    return (n * choose(n - 1, k - 1)) / k

I wrote it recursively because it's so simple and pretty, but you could transform it to an iterative solution if you like.我递归地写了它,因为它是如此简单和漂亮,但如果你愿意,你可以将它转换为迭代解决方案。

Probably the easiest way to compute binomial coefficients (n choose k) without overflowing is to use Pascal's triangle.计算二项式系数(n choose k)而不溢出的最简单方法可能是使用帕斯卡三角形。 No fractions or multiplications are necessary.不需要分数或乘法。 (n choose k) . (n choose k) The nth row and kth entry of Pascal's triangle gives the value.帕斯卡三角形的nth行和kth条目给出了值。

Take a look at this page . 看看这个页面 This is an O(n^2) operation with only addition, which you can solve with dynamic programming.这是一个只有加法的O(n^2)运算,您可以通过动态规划来解决。 It's going to be lightning fast for any number that can fit in a 64-bit integer.对于任何可以放入 64 位整数的数字来说,这将是闪电般的速度。

If you're going to calculate many combinations like this, calculating the Pascal's Triangle is sure the best option.如果你要计算很多这样的组合,计算帕斯卡三角形肯定是最好的选择。 As you already know the recursive formula, I think I can past some code here:正如您已经知道递归公式一样,我想我可以在这里传递一些代码:

MAX_N = 100
MAX_K = 100

C = [[1] + [0]*MAX_K for i in range(MAX_N+1)]

for i in range(1, MAX_N+1):
    for j in range(1, MAX_K+1):
        C[i][j] = C[i-1][j-1] + C[i-1][j];

print C[10][2]
print C[10][8]
print C[10][3]

如果您有一个阶乘查找表,那么 C(n,k) 的计算将非常快。

The problem with the n!/k!(nk)! n!/k!(nk)! approach is not so much the cost as the issue with !方法与其说是成本不如说是问题! growing very rapidly so that, even for values of nCk which are well within the scope of, say, 64-bit integers, intermediate calculations are not.增长非常迅速,因此即使nCk的值在 64 位整数范围内,中间计算也不会。 If you don't like kainaw's recursive addition approach you could try the multiplicative approach:如果您不喜欢 kainaw 的递归加法方法,您可以尝试乘法方法:

nCk == product(i=1..k) (n-(ki))/i

where product(i=1..k) means the product of all the terms when i takes the values 1,2,...,k .其中product(i=1..k)表示当i取值1,2,...,k时所有项的乘积。

The fastest way is probably to use the formula, and not pascals triangle.最快的方法可能是使用公式,而不是帕斯卡三角形。 Let's start not to do multiplications when we know that we're going to divide by the same number later.当我们知道以后要除以相同的数字时,让我们开始不做乘法。 If k < n/2, let's have k = n - k.如果 k < n/2,让我们有 k = n - k。 We know that C(n,k) = C(n,nk) Now :我们知道 C(n,k) = C(n,nk) 现在:

n! / (k! x (n-k)!) = (product of numbers between (k+1) and n) / (n-k)!

At least with this technique, you're never dividing by a number that you used to multiply before.至少使用这种技术,您永远不会除以以前用来相乘的数字。 You have (nk) multiplications, and (nk) divisions.你有 (nk) 乘法和 (nk) 除法。

I'm thinking about a way to avoid all divisions, by finding GCDs between the numbers that we have to multiply, and those we have to divide.我正在考虑一种避免所有除法的方法,通过在我们必须相乘的数字和我们必须除的数字之间找到 GCD。 I'll try to edit later.我稍后会尝试编辑。

After hitting a similar problem, I decided to compile the best solutions if have seen and run a simple test on each for a couple different values of n and k.在遇到类似的问题后,我决定编译最好的解决方案,如果已经看到并针对每个不同的 n 和 k 值运行一个简单的测试。 I started with 10 or so functions and weeded out the ones that were just plain wrong or stopped working at specific values.我从 10 个左右的函数开始,剔除了那些完全错误或停止在特定值下工作的函数。 Of all of the solutions, the answer above by user448810 is the cleanest and easiest to implement, I quite like it.在所有解决方案中,user448810 上面的答案是最干净、最容易实现的,我非常喜欢。 Below will be code including each test i ran, the number of times i ean each function for each test, each functions code, the functions output and the time it took to get that output.下面是代码,包括我运行的每个测试、每个测试的每个函数的次数、每个函数代码、函数输出以及获得该输出所花费的时间。 I only did 20000 runs, there were still fluctuations in the time if i re ran the tests, but you should get a general sense of how well each worked.我只运行了 20000 次,如果我重新运行测试,时间仍然会有波动,但你应该大致了解每个测试的运行情况。

//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //EXPECTED VALUES
    //x choose x = 1
    //9 choose 4 =126
    //52 choose 5 = 2598960;
    //64 choose 33 = 1.777090076065542336E18;

    //# of runs for each test: 20000
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//https://stackoverflow.com/a/12983878/4285191
public static double combination(long n, long k) {
    double sum=0;
    for(long i=0;i<k;i++) {
        sum+=Math.Log10(n-i);
        sum-=Math.Log10(i+1);
    }
    return Math.Pow(10, sum);
}
/*
    10 choose 10
    0.9999999999999992
    Elapsed=00:00:00.0000015
    9 choose 4
    126.00000000000001
    Elapsed=00:00:00.0000009
    52 choose 5
    2598959.9999999944
    Elapsed=00:00:00.0000013
    64 choose 33
    1.7770900760655124E+18
    Elapsed=00:00:00.0000058
*/


//........................................................
//https://stackoverflow.com/a/19125294/4285191
public static double BinomCoefficient(long n, long k) {
    if (k > n) return 0;
    if (n == k) return 1; // only one way to chose when n == k
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
    double c = 1;
    for (long i = 1; i <= k; i++) {
        c *= n--;
        c /= i;
    }
    return c;
}
/*
    10 choose 10
    1
    Elapsed=00:00:00
    9 choose 4
    126
    Elapsed=00:00:00.0000001
    52 choose 5
    2598960
    Elapsed=00:00:00.0000001
    64 choose 33
    1.7770900760655432E+18
    Elapsed=00:00:00.0000006
*/

//........................................................
//https://stackoverflow.com/a/15302448/4285191
public static double choose(long n, long k) {
    if (k == 0) return 1;
    return (n * choose(n-1, k-1)) / k;
}
/*
    10 choose 10
    1
    Elapsed=00:00:00.0000002
    9 choose 4
    126
    Elapsed=00:00:00.0000003
    52 choose 5
    2598960
    Elapsed=00:00:00.0000004
    64 choose 33
    1.777090076065543E+18
    Elapsed=00:00:00.0000008
*/

//........................................................
//My own version which is just a mix of the two best above.
public static double binomialCoeff(int n, int k) {
    if (k > n) return 0;
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
    double recusion(long n, long k) {
        if (k == 0) return 1; // only one way to chose when n == k
        return (n * recusion(n-1, k-1)) / k;
    }
    return recusion(n,k);
}
/*
    10 choose 10
    1
    Elapsed=00:00:00
    9 choose 4
    126
    Elapsed=00:00:00.0000001
    52 choose 5
    2598960
    Elapsed=00:00:00.0000002
    64 choose 33
    1.777090076065543E+18
    Elapsed=00:00:00.0000007
*/

//........................................................
//https://en.wikipedia.org/wiki/Binomial_coefficient
public static double binomial(long n, long k) {
    // take advantage of symmetry
    if (k > n-k)  k = n-k;
    
    long c = 1;
    for (long i = 1; i <= k; i++, n--) {
        // return 0 on potential overflow
        if (c/i >= long.MaxValue/n) return 0;
        // split c * n / i    into (c / i * i) + (c % i * n / i)
        c = (c / i * n) + (c % i * n / i); 
    }
    
    return c;
}

/*
    10 choose 10
    1
    Elapsed=00:00:00.0000006
    9 choose 4
    126
    Elapsed=00:00:00.0000002
    52 choose 5
    2598960
    Elapsed=00:00:00.0000003
    64 choose 33
    1.7770900760655424E+18
    Elapsed=00:00:00.0000029
*/

Using recursive functions to solve this problem is not optimal in terms of space complexity.就空间复杂度而言,使用递归函数来解决这个问题并不是最优的。 This is because recursive calls will build a call stack.这是因为递归调用会建立一个调用栈。 I think it is possible to achieve the computation of n Choose k with k <= n in a linear time complexity and constant space complexity.我认为可以在线性时间复杂度和恒定空间复杂度中实现 n 选择 k 且 k <= n 的计算。

For 0 < k <= n, the maximum of n, k and nk is n, therefore the idea is to only compute n!对于 0 < k <= n,n、k 和 nk 的最大值是 n,因此我们的想法是只计算 n! and to infer in the same loop, the values for k!并在同一个循环中推断 k! 的值! and (nk)!.和 (nk)!。 Thus the final time complexity is O(n).因此最终的时间复杂度为 O(n)。

Such a function could look like this:这样的函数可能如下所示:

public static long combinationsCount(int n, int k) {
    //this will hold the result for n!
    long a = 1;
    //this will hold the result for k!
    long b = 1;
    //this will hold the result for (n-k)!
    long c = 1;        
    for (int i = 1; i <= n; i++) {
        a *= i;
        //if the current value of i is k, then a is equal to k!
        if (i == k) {
            b = a;
        }
        //if the current value of i is n-k, then a is equal to (n-k)!
        if (i == n-k) {
            c = a;
        }
     }
     //n choose k formula
     return a/(b*c);
}

Using Pascal's triangle is a fast method for calculating n choose k.使用帕斯卡三角形是计算 n 选择 k 的一种快速方法。 You can refer to the answer here for more info.您可以在此处参考答案以获取更多信息。

The fastest method I know of would be to make use of the results from " On the Complexity of Calculating Factorials ".我所知道的最快的方法是利用“ 关于计算阶乘的复杂性”的结果。 Just calculate all 3 factorials, then perform the two division operations, each with complexity M(n logn) .只需计算所有 3 个阶乘,然后执行两个除法运算,每个运算的复杂度为 M(n logn)

Here is a very simple C++ implementation that minimizes the risk of overflow.这是一个非常简单的 C++ 实现,它最大限度地降低了溢出的风险。 You can also replace each long long with double to allow larger numbers at the cost of imprecise results.您还可以将每个long long替换为double以允许更大的数字,但代价是结果不精确。

long long combinations(int n, int k) {
    long long product = 1;
    for(int i = 1; i <= k; i++)
        product = product*(n - k + i)/i; // Must do mul before div
    return product;
}

The denominator will contain all the numbers from [1.. k] , while the numerator will contain all the numbers from [n - k + 1.. n] , noting that the first [1.. n - k] factors are cancelled out by the (n - k)!分母将包含[1.. k]中的所有数字,而分子将包含[n - k + 1.. n]中的所有数字,注意第一个[1.. n - k]因子被取消出(n - k)! factor on the denominator.分母上的因数。

The order of how to perform these operations matter.如何执行这些操作的顺序很重要。 It can be proven that (n - k + i) is divisible by i across all of the iteration steps, so none of the factors are off, as after i iterations, the numerator contains a product of i consecutive integers, which then implies the numerator is divisible by i .可以证明(n - k + i)在所有迭代步骤中都可以被i整除,因此没有一个因子是关闭的,因为在i次迭代之后,分子包含i个连续整数的乘积,这意味着分子可以被i整除。

I haven't seen this answer before in case it helps someone else我以前没见过这个答案,以防它帮助别人

def choose(n, k):
   nominator = n
   for i in range(1,k):
      nominator *= (n-i)
      k *= i 
   return nominator/k

"Most efficient" is a poor request. “最有效”是一个糟糕的要求。 What are you trying to make efficient?你想提高什么效率? The stack?堆栈? Memory?记忆? Speed?速度? Overall, my opinion is that the recursive method is most efficient because it only uses addition (a cheap operation) and the recursion won't be too bad for most cases.总的来说,我认为递归方法是最有效的,因为它只使用加法(一种廉价的操作),并且对于大多数情况,递归不会太糟糕。 The function is:功能是:

nchoosek(n, k)
{
    if(k==0) return 1;
    if(n==0) return 0;
    return nchoosek(n-1, k-1)+nchoosek(n-1,k);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM