简体   繁体   English

最有效的计算词典索引的方法

[英]Most efficient way to calculate lexicographic index

Can anybody find any potentially more efficient algorithms for accomplishing the following task?: 任何人都可以找到任何可能更有效的算法来完成以下任务吗?:

For any given permutation of the integers 0 thru 7, return the index which describes the permutation lexicographically (indexed from 0, not 1). 对于整数0到7的任何给定排列,返回按字典顺序描述排列的索引(从0开始索引,而不是1)。

For example, 例如,

  • The array 0 1 2 3 4 5 6 7 should return an index of 0. 数组0 1 2 3 4 5 6 7应返回0的索引。
  • The array 0 1 2 3 4 5 7 6 should return an index of 1. 数组0 1 2 3 4 5 7 6应返回索引1。
  • The array 0 1 2 3 4 6 5 7 should return an index of 2. 数组0 1 2 3 4 6 5 7应返回索引2。
  • The array 1 0 2 3 4 5 6 7 should return an index of 5039 (that's 7!-1 or factorial(7)-1 ). 数组1 0 2 3 4 5 6 7应返回索引5039(即7!-1或factorial(7)-1 )。
  • The array 7 6 5 4 3 2 1 0 should return an index of 40319 (that's 8!-1). 数组7 6 5 4 3 2 1 0应该返回40319的索引(即8!-1)。 This is the maximum possible return value. 这是最大可能的返回值。

My current code looks like this: 我当前的代码如下所示:

int lexic_ix(int* A){
    int value = 0;
    for(int i=0 ; i<7 ; i++){
        int x = A[i];
        for(int j=0 ; j<i ; j++)
            if(A[j]<A[i]) x--;
        value += x*factorial(7-i);  // actual unrolled version doesn't have a function call
    }
    return value;
}

I'm wondering if there's any way I can reduce the number of operations by removing that inner loop, or if I can reduce conditional branching in any way (other than unrolling - my current code is actually an unrolled version of the above), or if there are any clever bitwise hacks or filthy C tricks to help. 我想知道是否有任何方法可以通过删除内部循环减少操作次数,或者我是否可以以任何方式减少条件分支(除了展开 - 我的当前代码实际上是上述的展开版本),或者如果有任何聪明的按位黑客或肮脏的C技巧来帮助。

I already tried replacing 我已经尝试过更换了

if(A[j]<A[i]) x--;

with

x -= (A[j]<A[i]);

and I also tried 而且我也试过了

x = A[j]<A[i] ? x-1 : x;

Both replacements actually led to worse performance. 两种替换实际上都导致了更差的性能。

And before anyone says it - YES this is a huge performance bottleneck: currently about 61% of the program's runtime is spent in this function, and NO, I don't want to have a table of precomputed values. 在任何人说出之前 - 是的,这是一个巨大的性能瓶颈:目前大约61%的程序运行时花在了这个函数上,不,我不想有一个预先计算的值表。

Aside from those, any suggestions are welcome. 除此之外,欢迎任何建议。

Don't know if this helps but here's an other solution : 不知道这是否有帮助,但这是另一种解决方案:

int lexic_ix(int* A, int n){ //n = last index = number of digits - 1
    int value = 0;
    int x = 0;
    for(int i=0 ; i<n ; i++){
        int diff = (A[i] - x); //pb1
        if(diff > 0)
        {
            for(int j=0 ; j<i ; j++)//pb2
            {
                if(A[j]<A[i] && A[j] > x)
                {
                    if(A[j]==x+1)
                    {
                      x++;
                    }
                    diff--;
                }
            }
            value += diff;
        }
        else
        {
          x++;
        }
        value *= n - i;
    }
    return value;
}

I couldn't get rid of the inner loop, so complexity is o(n log(n)) in worst case, but o(n) in best case, versus your solution which is o(n log(n)) in all cases. 我无法摆脱内循环,因此在最坏的情况下复杂度是o(n log(n)),但在最好的情况下是o(n),而在所有情况下都是o(n log(n))案例。

Alternatively, you can replace the inner loop by the following to remove some worst cases at the expense of another verification in the inner loop : 或者,您可以通过以下内容替换内部循环以删除一些最坏的情况,但代价是内部循环中的另一个验证:

int j=0;
while(diff>1 && j<i)
{
  if(A[j]<A[i])
  {
    if(A[j]==x+1)
    {
      x++;
    }
    diff--;
  }
  j++;
}

Explanation : 说明

(or rather "How I ended with that code", I think it is not that different from yours but it can make you have ideas, maybe) (for less confusion I used characters instead and digit and only four characters) (或者更确切地说“我是如何用这个代码结束的”,我认为它与你的不同,但它可以让你有想法,也许)(为了减少混乱,我使用字符而不是数字,只有四个字符)

abcd 0  = ((0 * 3 + 0) * 2 + 0) * 1 + 0
abdc 1  = ((0 * 3 + 0) * 2 + 1) * 1 + 0
acbd 2  = ((0 * 3 + 1) * 2 + 0) * 1 + 0
acdb 3  = ((0 * 3 + 1) * 2 + 1) * 1 + 0
adbc 4  = ((0 * 3 + 2) * 2 + 0) * 1 + 0
adcb 5  = ((0 * 3 + 2) * 2 + 1) * 1 + 0 //pb1
bacd 6  = ((1 * 3 + 0) * 2 + 0) * 1 + 0
badc 7  = ((1 * 3 + 0) * 2 + 1) * 1 + 0
bcad 8  = ((1 * 3 + 1) * 2 + 0) * 1 + 0 //First reflexion
bcda 9  = ((1 * 3 + 1) * 2 + 1) * 1 + 0
bdac 10 = ((1 * 3 + 2) * 2 + 0) * 1 + 0
bdca 11 = ((1 * 3 + 2) * 2 + 1) * 1 + 0
cabd 12 = ((2 * 3 + 0) * 2 + 0) * 1 + 0
cadb 13 = ((2 * 3 + 0) * 2 + 1) * 1 + 0
cbad 14 = ((2 * 3 + 1) * 2 + 0) * 1 + 0
cbda 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0 //pb2
cdab 16 = ((2 * 3 + 2) * 2 + 0) * 1 + 0
cdba 17 = ((2 * 3 + 2) * 2 + 1) * 1 + 0
[...]
dcba 23 = ((3 * 3 + 2) * 2 + 1) * 1 + 0

First "reflexion" : 第一次“反思”

An entropy point of view. 熵的观点。 abcd have the fewest "entropy". abcd拥有最少的“熵”。 If a character is in a place it "shouldn't" be, it creates entropy, and the earlier the entropy is the greatest it becomes. 如果一个角色位于一个“不应该”的地方,它就会产生熵,而熵越早越好。

For bcad for example, lexicographic index is 8 = (( 1 * 3 + 1 ) * 2 + 0 ) * 1 + 0 and can be calculated that way : 例如,对于bcad,词典索引是8 =(( 1 * 3 + 1 )* 2 + 0 )* 1 + 0并且可以这样计算:

value = 0;
value += max(b - a, 0); // = 1; (a "should be" in the first place [to create the less possible entropy] but instead it is b)
value *= 3 - 0; //last index - current index
value += max(c - b, 0); // = 1; (b "should be" in the second place but instead it is c)
value *= 3 - 1;
value += max(a - c, 0); // = 0; (a "should have been" put earlier, so it does not create entropy to put it there)
value *= 3 - 2;
value += max(d - d, 0); // = 0;

Note that the last operation will always do nothing, that's why "i 请注意,最后一次操作将始终无效,这就是为什么“我

First problem (pb1) : 第一个问题 (pb1):

For adcb, for example, the first logic doesn't work (it leads to an lexicographic index of ((0* 3+ 2) * 2+ 0) * 1 = 4) because cd = 0 but it creates entropy to put c before b. 例如,对于adcb,第一个逻辑不起作用(它导致((0 * 3 + 2)* 2+ 0)* 1 = 4的字典索引,因为cd = 0但它创建了放入c的熵在b之前。 I added x because of that, it represents the first digit/character that isn't placed yet. 我添加了x因为它,它代表了尚未放置的第一个数字/字符。 With x, diff cannot be negative. 使用x,diff不能为负数。 For adcb, lexicographic index is 5 = (( 0 * 3 + 2 ) * 2 + 1 ) * 1 + 0 and can be calculated that way : 对于adcb,词典索引是5 =(( 0 * 3 + 2 )* 2 + 1 )* 1 + 0并且可以这样计算:

value = 0; x=0;
diff = a - a; // = 0; (a is in the right place)
diff == 0 => x++; //x=b now and we don't modify value
value *= 3 - 0; //last index - current index
diff = d - b; // = 2; (b "should be" there (it's x) but instead it is d)
diff > 0 => value += diff; //we add diff to value and we don't modify x
diff = c - b; // = 1; (b "should be" there but instead it is c) This is where it differs from the first reflexion
diff > 0 => value += diff;
value *= 3 - 2;

Second problem (pb2) : 第二个问题 (pb2):

For cbda, for example, lexicographic index is 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0, but the first reflexion gives : ((2 * 3 + 0) * 2 + 1) * 1 + 0 = 13 and the solution to pb1 gives ((2 * 3 + 1) * 2 + 3) * 1 + 0 = 17. The solution to pb1 doesn't work because the two last characters to place are d and a, so d - a "means" 1 instead of 3. I had to count the characters placed before that comes before the character in place, but after x, so I had to add an inner loop. 例如,对于cbda,词典索引是15 =((2 * 3 + 1)* 2 + 1)* 1 + 0,但是第一个反射给出:((2 * 3 + 0)* 2 + 1)* 1 + 0 = 13并且pb1的解给出((2 * 3 + 1)* 2 + 3)* 1 + 0 = 17.对pb1的解决方案不起作用,因为要放置的最后两个字符是d和a,所以d - 一个“意思是”1代替3.我必须计算在角色到位之前放置的角色,但是在x之后,所以我不得不添加一个内循环。

Putting it all together : 把它们放在一起

I then realised that pb1 was just a particular case of pb2, and that if you remove x, and you simply take diff = A[i], we end up with the unnested version of your solution (with factorial calculated little by little, and my diff corresponding to your x). 然后我意识到pb1只是pb2的一个特例,如果你删除x,你只需要使用diff = A [i],我们最终得到你的解决方案的unnested版本(逐步计算因子,并且我的差异对应你的x)。

So, basically, my "contribution" (I think) is to add a variable, x, which can avoid doing the inner loop when diff equals 0 or 1, at the expense of checking if you have to increment x and doing it if so. 所以,基本上,我的“贡献”(我认为)是添加一个变量x,它可以避免在diff等于0或1时执行内部循环,代价是检查是否必须增加x并执行它,如果是这样的话。

I also checked if you have to increment x in the inner loop (if(A[j]==x+1)) because if you take for example badce, x will be b at the end because a comes after b, and you will enter the inner loop one more time, encountering c. 我还检查了你是否必须在内循环中增加x(如果(A [j] == x + 1)),因为如果你采用例如badce,x将在结尾处为b,因为a在b之后出现,而你将再次进入内循环,遇到c。 If you check x in the inner loop, when you encounter d you have no choice but doing the inner loop, but x will update to c, and when you encounter c you will not enter the inner loop. 如果在内循环中检查x,当遇到d时你除了做内循环之外别无选择,但x会更新为c,当你遇到c时你不会进入内循环。 You can remove this check without breaking the program 您可以在不破坏程序的情况下删除此检查

With the alternative version and the check in the inner loop it makes 4 different versions. 使用替代版本和内部循环检查,它有4个不同的版本。 The alternative one with the check is the one in which you enter the less the inner loop, so in terms of "theoretical complexity" it is the best, but in terms of performance/number of operations, I don't know. 检查的替代方案是你输入内循环越少的那个,所以就“理论复杂性”来说它是最好的,但就性能/操作次数而言,我不知道。

Hope all of this helps (since the question is rather old, and I didn't read all the answers in details). 希望所有这些都有所帮助(因为问题相当陈旧,我没有详细阅读所有答案)。 If not, I still had fun doing it. 如果没有,我仍然很开心。 Sorry for the long post. 对不起,很长的帖子。 Also I'm new on Stack Overflow (as a member), and not a native speaker, so please be nice, and don't hesitate to let me know if I did something wrong. 另外我是Stack Overflow的新手(作为会员),而不是母语人士,所以请保持愉快,如果我做错了,请不要犹豫,让我知道。

Linear traversal of memory already in cache really doesn't take much times at all. 已经在高速缓存中的内存的线性遍历实际上并不需要花费太多时间。 Don't worry about it. 别担心。 You won't be traversing enough distance before factorial() overflows. 在factorial()溢出之前,您将不会遍历足够的距离。

Move the 8 out as a parameter. 8输出作为参数。

int factorial ( int input )
{
    return input ? input * factorial (input - 1) : 1;
}

int lexic_ix ( int* arr, int N )
{
    int output = 0;
    int fact = factorial (N);
    for ( int i = 0; i < N - 1; i++ )
    {
        int order = arr [ i ];
        for ( int j = 0; j < i; j++ )
            order -= arr [ j ] < arr [ i ];
        output += order * (fact /= N - i);
    }
    return output;
}

int main()
{
    int arr [ ] = { 11, 10, 9, 8, 7 , 6 , 5 , 4 , 3 , 2 , 1 , 0 };

    const int length = 12;
    for ( int i = 0; i < length; ++i )
        std::cout << lexic_ix ( arr + i, length - i  ) << std::endl;
}

Say, for a M-digit sequence permutation, from your code, you can get the lexicographic SN formula which is something like: Am-1*(m-1)! 比如说,对于M位序列排列,您可以从代码中获得字典SN公式,如:Am-1 *(m-1)! + Am-2*(m-2)! + Am-2 *(m-2)! + ... + A0*(0)! + ... + A0 *(0)! , where Aj range from 0 to j. ,其中Aj的范围从0到j。 You can calculate SN from A0*(0)!, then A1*(1)!, ..., then Am-1 * (m-1)!, and add these together(suppose your integer type does not overflow), so you do not need calculate factorials recursively and repeatedly. 你可以从A0 *(0)!,然后A1 *(1)!,...,然后是Am-1 *(m-1)!来计算SN,并将它们加在一起(假设你的整数类型没有溢出),所以你不需要递归地重复计算阶乘。 The SN number is a range from 0 to M!-1 (because Sum(n*n!, n in 0,1, ...n) = (n+1)!-1) SN编号的范围是0到M!-1(因为Sum(n * n!,n在0,1,... n中)=(n + 1)! - 1)

If you are not calculating factorials recursively, I cannot think of anything that could make any big improvement. 如果你不是递归地计算阶乘,我想不出任何可以带来任何重大改进的东西。

Sorry for posting the code a little bit late, I just did some research, and find this: http://swortham.blogspot.com.au/2011/10/how-much-faster-is-multiplication-than.html according to this author, integer multiplication can be 40 times faster than integer division. 很抱歉发布的代码有点迟了,我刚做了一些研究,并找到了这个: http//swortham.blogspot.com.au/2011/10/how-much-faster-is-multiplication-than.html对于这个作者,整数乘法可以比整数除法快40倍。 floating numbers are not so dramatic though, but here is pure integer. 浮动数字虽然不是那么引人注目,但这里是纯整数。

int lexic_ix ( int arr[], int N )
{
    // if this function will be called repeatedly, consider pass in this pointer as parameter
    std::unique_ptr<int[]> coeff_arr = std::make_unique<int[]>(N);
    for ( int i = 0; i < N - 1; i++ )
    {
        int order = arr [ i ];
        for ( int j = 0; j < i; j++ )
            order -= arr [ j ] < arr [ i ];
        coeff_arr[i] = order; // save this into coeff_arr for later multiplication
    }
    // 
    // There are 2 points about the following code:
    // 1). most modern processors have built-in multiplier, \
    //    and multiplication is much faster than division
    // 2). In your code, you are only the maximum permutation serial number,
    //     if you put in a random sequence, say, when length is 10, you put in
    //     a random sequence, say, {3, 7, 2, 9, 0, 1, 5, 8, 4, 6}; if you look into
    //     the coeff_arr[] in debugger, you can see that coeff_arr[] is:
    //     {3, 6, 2, 6, 0, 0, 1, 2, 0, 0}, the last number will always be zero anyway.
    //     so, you will have good chance to reduce many multiplications.
    //     I did not do any performance profiling, you could have a go, and it will be
    //     much appreciated if you could give some feedback about the result.
    //
    long fac = 1;
    long sn = 0;
    for (int i = 1; i < N; ++i) // start from 1, because coeff_arr[N-1] is always 0 
    {
        fac *= i;
        if (coeff_arr[N - 1 - i])
            sn += coeff_arr[N - 1 - i] * fac;
    }
    return sn;
}

int main()
{
    int arr [ ] = { 3, 7, 2, 9, 0, 1, 5, 8, 4, 6 }; // try this and check coeff_arr

    const int length = 10;
    std::cout << lexic_ix(arr, length ) << std::endl;
    return 0;
}

This is the whole profiling code, I only run the test in Linux, code was compiled using G++8.4, with '-std=c++11 -O3' compiler options. 这是整个分析代码,我只在Linux中运行测试,代码是使用G ++ 8.4编译的,带有'-std = c ++ 11 -O3'编译器选项。 To be fair, I slightly rewrote your code, pre-calculate the N! 公平地说,我稍微重写了你的代码,预先计算了N! and pass it into the function, but it seems this does not help much. 并将其传递给函数,但似乎这没有多大帮助。

The performance profiling for N = 9 (362,880 permutations) is: N = 9(362,880个排列)的性能分析是:

  • Time durations are: 34, 30, 25 milliseconds 持续时间为:34,30,25毫秒
  • Time durations are: 34, 30, 25 milliseconds 持续时间为:34,30,25毫秒
  • Time durations are: 33, 30, 25 milliseconds 持续时间为:33,30,25毫秒

The performance profiling for N=10 (3,628,800 permutations) is: N = 10(3,628,800个排列)的性能分析是:

  • Time durations are: 345, 335, 275 milliseconds 持续时间为:345,335,275毫秒
  • Time durations are: 348, 334, 275 milliseconds 持续时间为:348,334,275毫秒
  • Time durations are: 345, 335, 275 milliseconds 持续时间为:345,335,275毫秒

The first number is your original function, the second is the function re-written that gets N! 第一个数字是你的原始函数,第二个是重写的函数得到N! passed in, the last number is my result. 传入,最后一个数字是我的结果。 The permutation generation function is very primitive and runs slowly, but as long as it generates all permutations as testing dataset, that is alright. 置换生成函数非常原始并且运行缓慢,但只要它生成所有排列作为测试数据集,那就没问题。 By the way, these tests are run on a Quad-Core 3.1Ghz, 4GBytes desktop running Ubuntu 14.04. 顺便说一句,这些测试是在运行Ubuntu 14.04的四核3.1Ghz,4GBytes桌面上运行的。

EDIT: I forgot a factor that the first function may need to expand the lexi_numbers vector, so I put an empty call before timing. 编辑:我忘记了第一个函数可能需要扩展lexi_numbers向量的因素,所以我在计时前放了一个空调。 After this, the times are 333, 334, 275. 在此之后,时间是333,334,275。

EDIT: Another factor that could influence the performance, I am using long integer in my code, if I change those 2 'long' to 2 'int', the running time will become: 334, 333, 264. 编辑:另一个可能影响性能的因素,我在我的代码中使用长整数,如果我将那些2'长'更改为2'int',则运行时间将变为:334,333,264。

#include <iostream>
#include <vector>
#include <chrono>
using namespace std::chrono;

int factorial(int input)
{
    return input ? input * factorial(input - 1) : 1;
}

int lexic_ix(int* arr, int N)
{
    int output = 0;
    int fact = factorial(N);
    for (int i = 0; i < N - 1; i++)
    {
        int order = arr[i];
        for (int j = 0; j < i; j++)
            order -= arr[j] < arr[i];
        output += order * (fact /= N - i);
    }
    return output;
}

int lexic_ix1(int* arr, int N, int N_fac)
{
    int output = 0;
    int fact = N_fac;
    for (int i = 0; i < N - 1; i++)
    {
        int order = arr[i];
        for (int j = 0; j < i; j++)
            order -= arr[j] < arr[i];
        output += order * (fact /= N - i);
    }
    return output;
}

int lexic_ix2( int arr[], int N , int coeff_arr[])
{
    for ( int i = 0; i < N - 1; i++ )
    {
        int order = arr [ i ];
        for ( int j = 0; j < i; j++ )
            order -= arr [ j ] < arr [ i ];
        coeff_arr[i] = order;
    }
    long fac = 1;
    long sn = 0;
    for (int i = 1; i < N; ++i)
    {
        fac *= i;
        if (coeff_arr[N - 1 - i])
            sn += coeff_arr[N - 1 - i] * fac;
    }
    return sn;
}

std::vector<std::vector<int>> gen_permutation(const std::vector<int>& permu_base)
{
    if (permu_base.size() == 1)
        return std::vector<std::vector<int>>(1, std::vector<int>(1, permu_base[0]));

    std::vector<std::vector<int>> results;
    for (int i = 0; i < permu_base.size(); ++i)
    {
        int cur_int = permu_base[i];
        std::vector<int> cur_subseq = permu_base;
        cur_subseq.erase(cur_subseq.begin() + i);
        std::vector<std::vector<int>> temp = gen_permutation(cur_subseq);
        for (auto x : temp)
        {
            x.insert(x.begin(), cur_int);
            results.push_back(x);
        }
    }
    return results;
}

int main()
{
    #define N 10
    std::vector<int> arr;
    int buff_arr[N];
    const int length = N;
    int N_fac = factorial(N);
    for(int i=0; i<N; ++i)
        arr.push_back(N-i-1); // for N=10, arr is {9, 8, 7, 6, 5, 4, 3, 2, 1, 0}
    std::vector<std::vector<int>> all_permus = gen_permutation(arr);

    std::vector<int> lexi_numbers;
    // This call is not timed, only to expand the lexi_numbers vector 
    for (auto x : all_permus)
        lexi_numbers.push_back(lexic_ix2(&x[0], length, buff_arr));

    lexi_numbers.clear();
    auto t0 = high_resolution_clock::now();
    for (auto x : all_permus)
        lexi_numbers.push_back(lexic_ix(&x[0], length));
    auto t1 = high_resolution_clock::now();
    lexi_numbers.clear();
    auto t2 = high_resolution_clock::now();
    for (auto x : all_permus)
        lexi_numbers.push_back(lexic_ix1(&x[0], length, N_fac));
    auto t3 = high_resolution_clock::now();
    lexi_numbers.clear();
    auto t4 = high_resolution_clock::now();
    for (auto x : all_permus)
        lexi_numbers.push_back(lexic_ix2(&x[0], length, buff_arr));
    auto t5 = high_resolution_clock::now();

std::cout << std::endl << "Time durations are: " << duration_cast<milliseconds> \
    (t1 -t0).count() << ", " << duration_cast<milliseconds>(t3 - t2).count() << ", " \
        << duration_cast<milliseconds>(t5 - t4).count() <<" milliseconds" << std::endl;
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM