简体   繁体   English

C#/ XNA - 乘法比分区更快?

[英]C#/XNA - Multiplication faster than Division?

I saw a tweet recently that confused me (this was posted by an XNA coder, in the context of writing an XNA game): 我最近看到一条推文让我很困惑(这是由XNA编码器在编写XNA游戏时发布的):

Microoptimization tip of the day: when possible, use multiplication instead of division in high frequency areas. 当天的微优化提示:在可能的情况下,在高频区域使用乘法而不是除法。 It's a few cycles faster. 它的速度提高了几个周期。

I was quite surprised, because I always thought compilers where pretty smart (for example, using bit-shifting), and recently read a post by Shawn Hargreaves saying much the same thing . 我很惊讶,因为我一直认为编译器非常聪明(例如,使用位移),最近读了Shawn Hargreaves的一篇文章说了很多相同的事情 I wondered how much truth there was in this, since there are lots of calculations in my game. 我想知道这有多少真相,因为我的游戏中有很多计算。

I inquired, hoping for a sample, however the original poster was unable to give one. 我询问,希望有一个样本,但原始的海报无法给出一个。 He did, however, say this: 然而,他这样说:

Not necessarily when it's something like "center = width / 2". 不一定是“center = width / 2”之类的东西。 And I've already determined "yes, it's worth it". 而且我已经确定“是的,这是值得的”。 :) :)

So, I'm curious... 所以,我很好奇......

Can anyone give an example of some code where you can change a division to a multiplication and get a performance gain, where the C# compiler wasn't able to do the same thing itself. 任何人都可以给出一些代码示例,您可以将分区更改为乘法并获得性能增益,其中C#编译器本身无法执行相同的操作。

Most compilers can do a reasonable job of optimizing when you give them a chance. 当你给他们机会时,大多数编译器都可以做出合理的优化工作。 For example, if you're dividing by a constant , chances are pretty good that the compiler can/will optimize that so it's done about as quickly as anything you can reasonably substitute for it. 例如,如果你除以一个常数 ,那么编译器可以/将优化它的可能性非常大,所以它的完成速度与你可以合理地替代它的速度一样快。

When, however, you have two values that aren't known ahead of time, and you need to divide one by the other to get the answer, if there was much way for the compiler to do much with it, it would -- and for that matter, if there was much room for the compiler to optimize it much, the CPU would do it so the compiler didn't have to. 但是,如果你有两个提前未知的值,并且你需要将一个值除以另一个来得到答案,如果编译器有很多方法可以对它做很多事情,它会 - 和就此而言,如果编译器有很大的空间来优化它,那么CPU会这样做,所以编译器不必这样做。

Edit: Your best bet for something like that (that's reasonably realistic) would probably be something like: 编辑:对于类似的东西(这是相当现实的)最好的选择可能是这样的:

double scale_factor = get_input();

for (i=0; i<values.size(); i++)
    values[i] /= scale_factor;

This is relatively easy to convert to something like: 这相对容易转换为:

scale_factor = 1.0 / scale_factor;

for (i=0; i<values.size(); i++)
    values[i] *= scale_factor;

I can't really guarantee much one way or the other about a particular compiler doing that. 对于特定的编译器来说,我无法真正保证这一点。 It's basically a combination of strength reduction and loop hoisting. 它基本上是强度降低和环路提升的组合。 There are certainly optimizers that know how to do both, but what I've seen of the C# compiler suggests that it may not (but I never tested anything exactly like this, and the testing I did was a few versions back...) 当然有优化器知道如何做到这两点,但我所看到的C#编译器表明它可能没有(但我从未测试过这样的任何东西,我做的测试是几个版本回来......)

Although the compiler can optimize out divisions and multiplications by powers of 2, other numbers can be difficult or impossible to optimize. 虽然编译器可以用2的幂来优化除法和乘法,但是其他数字可能很难或不可能优化。 Try optimizing a division by 17 and you'll see why. 尝试优化除以17,你会明白为什么。 This is of course assuming the compiler doesn't know that you are dividing by 17 ahead of time (it is a run-time variable, not a constant). 这当然是假设编译器不知道您提前除以17(它是运行时变量,而不是常量)。

Bit late but never mind. 有点迟到但没关系。

The answer to your question is yes. 你的问题的答案是肯定的。

Have a look at my article here, http://www.codeproject.com/KB/cs/UniqueStringList2.aspx , which uses information based on the article mentioned in the first comment to your question. 请查看我的文章http://www.codeproject.com/KB/cs/UniqueStringList2.aspx ,它使用的信息基于您问题的第一条评论中提到的文章。

I have a QuickDivideInfo struct which stores the magic number and the shift for a given divisor thus allowing division and modulo to be calculated using faster multiplication. 我有一个QuickDivideInfo结构,它存储幻数和给定除数的移位,从而允许使用更快的乘法计算除法和模数。 I pre-computed (and tested!) QuickDivideInfos for a list of Golden Prime Numbers. 我为Quick Prime数字列表预先计算(并测试!)QuickDivideInfos。 For x64 at least, the .Divide method on QuickDivideInfo is inlined and is 3x quicker than using the divide operator (on an i5); 至少对于x64,QuickDivideInfo上的.Divide方法是内联的,比使用除法运算符快3倍(在i5上); it works for all numerators except int.MinValue and cannot overflow since the multiplication is stored in 64 bits before shifting. 它适用于除int.MinValue之外的所有分子,并且不能溢出,因为乘法在移位之前存储在64位中。 (I've not tried on x86 but if it doesn't inline for some reasons then the neatness of the Divide method would be lost and you would have to manually inline it). (我没有尝试过x86,但如果由于某些原因它没有内联,那么Divide方法的整洁性将会丢失,你必须手动内联它)。

So the above will work in all scenarios (except int.MinValue) if you can precalculate. 因此,如果您可以预先计算,上述内容将适用于所有场景(int.MinValue除外)。 If you trust the code that generates the magic number/shift, then you can deal with any divisor at runtime. 如果您信任生成幻数/移位的代码,那么您可以在运行时处理任何除数。

Other well-known small divisors with a very limited range of numerators could be written inline and may well be faster if they don't need an intermediate long. 其他具有非常有限的分子范围的着名小除数可以内联写入,如果它们不需要中间长度则可能更快。

Division by multiple of two: I would expect the compiler to deal with this (as in your width / 2) example since it is constant. 除以2的倍数:我希望编译器处理这个(如你的width / 2)例子,因为它是常量。 If it doesn't then changing it to width >> 1 should be fine 如果没有,那么将其更改为宽度>> 1应该没问题

To give some numbers, on this pdf 在这个pdf上给出一些数字

http://cs.smith.edu/dftwiki/index.php/CSC231_Pentium_Instructions_and_Flags http://cs.smith.edu/dftwiki/index.php/CSC231_Pentium_Instructions_and_Flags

of the Pentium we get some numbers, and they aren't good: 奔腾我们得到一些数字,他们并不好:

  • IMUL 10 or 11 IMUL 10或11
  • FMUL 3+1 FMUL 3 + 1
  • IDIV 46 (32 bits operand) IDIV 46(32位操作数)
  • FDIV 39 FDIV 39

We are speaking of BIG differences 我们说的是大的差异

 while(start<=end)
    {
    int mid=(start+end)/2;
    if(mid*mid==A)
    return mid;
    if(mid*mid<A)
    {
    start=mid+1;
    ans=mid;
    }

If i am doing this way the outcome is the TIME LIMIT EXCEEDED for square root of 2147483647 如果我这样做,结果是2147483647的平方根超过了时间限制

But if i am doing the following way then the thing is clear that for Division compiler responds faster than for multiplication. 但是,如果我按照以下方式进行操作,那么事情就很明显,因为Division编译器的响应速度比乘法速度快。

while(start<=end)
    {
    int mid=(start+end)/2;
    if(mid==A/mid)
    return mid;
    if(mid<A/mid)
    {
    start=mid+1;
    ans=mid;
    }
    else
    end=mid-1;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM