简体   繁体   English

为什么这种“优化”会使我的程序变慢?

[英]Why does this “optimization” slow down my program?

i'm writing a graphics engine as an assignment for university and recently tried to optimize a part of my code, however the optimization seems to slow it down instead. 我正在为大学分配图形引擎,最近尝试优化我的代码的一部分,但是优化似乎减慢了它的速度。

This specific part of code processes 2D Lindenmayer systems and turns them into a list of "line2D" objects that can be processed into an image by another part of the program. 代码的这一特定部分处理2D Lindenmayer系统,并将它们转换为“ line2D”对象的列表,这些对象可以由程序的另一部分处理为图像。

In doing that it uses a sin and cos to calculate the coordinates of the next point, and because sin and cos are floating point operations i figured these would be time intensive, especially in more complex lindenmayer systems. 这样做时,它使用sin和cos来计算下一点的坐标,并且由于sin和cos是浮点运算,所以我认为这些操作会占用大量时间,尤其是在更复杂的lindenmayer系统中。 so I created an object class "cossinlist" which imports the valus of cos and sin from a .txt file for every integer angle between 0 and 359 degrees (transformed to rad) to two maps called "coslist" and "sinlist" with the angle as key. 因此我创建了一个对象类“ cossinlist”,该对象类从.txt文件导入0和359度之间的每个整数角度(转换为rad)的cos和sin值,并转换为两个具有该角度的映射,分别称为“ coslist”和“ sinlist”作为关键。 That way I'd only have to do the actual flops when dealing with an angle that contains a decimal part. 这样,在处理包含小数部分的角度时,我只需要进行实际的翻牌操作即可。

Then i decided to measure the execution time with the optimization and without it (by commenting it out) on a relatively intensive system: with it the engine generated the image in 33.4016 seconds and without it only took 25.3686 seconds. 然后,我决定在一个相对密集的系统上,通过优化来评估执行时间,而没有优化(通过注释掉):引擎在33.4016秒内生成图像,而没有花了25.3686秒。 this is a substantial difference, however not in the expected way. 这是一个很大的差异,但是不是预期的方式。 I did more tests and all of them gave similiar proportions of difference, so now im wondering... What causes this difference? 我做了更多的测试,所有测试都给出了相似的差异,所以现在我想知道...是什么原因导致了差异?

The function: 功能:

img::EasyImage LSystem2D(const unsigned int size, const ini::DoubleTuple & backgroundcolor, LParser::LSystem2D & System, const ini::DoubleTuple & color)
{
    CosSinList cossinlist;
    std::string string;
    Lines2D Lines;
    double origin = 0;
    Point2D currentpos(origin, origin);
    Point2D newpos(origin, origin);
    std::stack<Point2D> savedpositions;
    double currentangle = System.get_starting_angle();
    std::stack<double> savedangles;
    const img::Color linecolor(color.at(0)*255,color.at(1)*255,color.at(2)*255);
    const img::Color BGcolor(backgroundcolor.at(0)*255,backgroundcolor.at(1)*255,backgroundcolor.at(2)*255);
    string = ReplaceLsystem(System, (System.get_initiator()), (System.get_nr_iterations()));
    bool optimizedangle = false;
    if(System.get_angle() == rint(System.get_angle()) && (System.get_starting_angle() == rint(System.get_starting_angle()))
    {
        optimizedangle = true;
    }
    for(char& c : string)
    {
        if(currentangle > 359){currentangle -= 360;}
        if(currentangle < -359){currentangle += 360;}
        if(System.get_alphabet().count(c) != 0)
        {
            /*if(optimizedangle == true)
            {
                if(currentangle >= 0)
                {
                    newpos.X = currentpos.X+(cossinlist.coslist[currentangle]);
                    newpos.Y = currentpos.Y+(cossinlist.sinlist[currentangle]);
                }
                else
                {
                    newpos.X = currentpos.X+(cossinlist.coslist[360+currentangle]);
                    newpos.Y = currentpos.Y+(cossinlist.sinlist[360+currentangle]);
                }
            }
            else
            {*/
                newpos.X = currentpos.X+cos(currentangle*PI/180);
                newpos.Y = currentpos.Y+sin(currentangle*PI/180);
            //}
            if(System.draw(c))
            {
                Lines.push_back(Line2D(currentpos,newpos,linecolor));
                currentpos = newpos;
            }
            else
            {
                currentpos = newpos;
            }

        }
        else if(c=='-')
        {
            currentangle -= System.get_angle();
        }
        else if(c=='+')
        {
            currentangle += System.get_angle();
        }
        else if(c=='[')
        {
            savedpositions.push(currentpos);
            savedangles.push(currentangle);
        }
        else if(c==']')
        {
            currentpos = savedpositions.top();
            savedpositions.pop();
            currentangle = savedangles.top();
            savedangles.pop();

        }
    }
    return Drawlines2D(Lines, size, BGcolor);
}

The SinCosList class: SinCosList类:

#include <fstream>
#include <iostream>
#include <map>
#include "CosSinList.h"
using namespace std;

CosSinList::CosSinList()
{
    string line;
    std::fstream cosstream("coslist.txt", std::ios_base::in);
    double a;
    double i = 0;
    while (cosstream >> a)
    {
        coslist[i] = a;
        i += 1;
    }
    std::fstream sinstream("sinlist.txt", std::ios_base::in);
    i = 0;
    while (sinstream >> a)
    {
        sinlist[i] = a;
        i += 1;
    }
};

CosSinList::~CosSinList(){};

The "optimization" is commented out in the same way i commented it out during the speed test, only the actual use of the object is commented out (the SinCosList is still being initialized and the boolean that checks if it can be used is also still being initialized) “优化”的注释方式与我在速度测试期间注释的方式相同,仅注释了对象的实际使用(SinCosList仍在初始化,并且仍检查其是否可以使用的布尔值)正在初始化)

(I'm assuming coslist and sinlist are ordinary arrays or similar) (我假设coslistsinlist是普通数组或类似数组)

Some things: 一些事情:

  • You really should turn optimizations on 您确实应该启用优化

With optimizations off, you're measuring stuff that doesn't matter. 关闭优化功能后,您可以衡量无关紧要的内容。 Performance of unoptimized code correlates poorly with the performance once optimizations are turned on. 启用优化后,未优化代码的性能与性能之间的关联性很差。

  • optimzedangle should be a compile-time constant. optimzedangle应该是一个编译时常量。

The optimizer is likely to be able to simplify code if it knows that optimizedangle doesn't change throughout the run of this program. 如果优化器知道在整个程序运行期间optimizedangle不改变,则优化器可能会简化代码。 With this particular code snippet it can probably figure it out, but you shouldn't rely on that if you don't have to, and in general it's very easy to accidentally write code where you think it's obvious that a variable remains constant, but the compiler is smarter than you and realizes you've opened a loophole that could potentially allow the variable to change, and so it has to write slower loop to account for that. 有了这个特定的代码片段,它可能可以弄清楚,但如果不必,您就不要依赖它,通常,在您认为变量很明显保持不变的情况下,意外地编写代码非常容易。编译器比您更聪明,并且意识到您已经打开了一个漏洞,该漏洞可能使变量发生更改,因此必须编写较慢的循环来解决该问题。

  • Branching can be bad 分支可能不好

Branching in an inner loop -- especially unpredictable branching -- can kill performance. 内部循环中的分支-尤其是不可预测的分支-可能会降低性能。 Try to write your loop so there aren't any branches; 尝试编写循环,以便没有分支; eg ensure that currentangle is always positive, or maybe make the lookup table 720 entries long so you can always just index 360 + currentangle . 例如,确保currentangle始终为正,或者使查找表720条目变长,以便您始终可以仅索引360 + currentangle

  • Floating point <--> integer conversions can be slow 浮点<->整数转换可能很慢

I tend to avoid these and consequently I've never been good at predicting when it really is an issue, but it's possible that this is what's really killing you. 我倾向于避免这些情况,因此我从来都不擅长预测何时确实是一个问题,但是这可能是真正杀死您的原因。

  • Your table is consuming cache 您的表正在使用缓存

You don't post your data structure, but I'm imagining around 6k bytes. 您没有发布数据结构,但是我想象大约6k字节。 That's a nontrivial percentage of your L1 cache. 这是L1缓存的重要部分。 It's nonobvious to me whether that's an important effect in this loop. 对我而言,这是否在此循环中发挥重要作用并不明显。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM