简体   繁体   English

使用floor和std :: set时函数中的慢循环

[英]slower loops in a function when using floor and std::set

I'm writing a class in windows using visual studio, one of it's public function has a big for loop looks like below, 我正在使用Visual Studio在Windows中编写一个类,其中一个公共函数有一个很大的for循环,如下所示,

void brain_network_opencl::block_filter_fcd_all(int m)
{
  const int m_block_len = m * block_len;
  time_t start, end;
  for (int j = 0; j < shift_2d_gpu[1]; j++) // local work size/number of rows per block
  {
    for (int i = 0; i < masksize; i++)  // number of extracted voxels
    {
        if (j + m_block_len != i)
        {
            //if (floor(dst_ptr_gpu[i + j * masksize] * power_up) > threadhold_fcd)
            if ((int)(dst_ptr_gpu[i + j * masksize] * power_up) > threadhold_fcd)
            {
                org_row = mask_ind[j + m_block_len];
                org_col = mask_ind[i];

                nodes.insert(org_row);
                conns.insert(make_pair(org_row, org_col));
            }

        }
    }
}
end = clock();
cout << end - start << "ms" << " for block" << j << endl;
}

where nodes is std::set<set> , conns is std::multimap<int, int> and mask_ind is std::vector<int> , they are declared as private variables as well as masksize and shift_2d_gpu; 其中nodesstd::set<set>connsstd::multimap<int, int>mask_indstd::vector<int> ,它们被声明为私有变量,以及masksize和shift_2d_gpu;

Major time costs by floor and .insert ; floor.insert主要时间成本;

The problem is, the same code (with all the variables) in a main function costs only 1/5~1 the time than it calls from here. 问题是,主函数中相同的代码(包含所有变量)比从此处调用仅花费1/5〜1的时间。 And if I replace (int) by floor in both function and main(), it costs much more in this function; 而且如果我在函数和main()中都用floor替换(int) ,则此函数的成本会更高;

What causes this problem and do I have to write it all inside a main()? 是什么导致此问题,我是否必须将其全部写入main()中? By the way does it has something to do with the overloads ? 顺便说一句,它是否与overloads floor shows +3 overloads and .insert shows +5 overloads floor显示+3 overloads.insert显示+5 overloads

updates 更新

I copy the codes of this function to another new console project's main function. 我将此功能的代码复制到另一个新控制台项目的主要功能。 It's still much slower than my first function (codes also in main)!!! 它仍然比我的第一个函数慢得多(代码也在main中)!!! Now I'm confused... It's there any settings that make floor and .insert faster? 现在我很困惑...是否有任何设置可以使floor.insert更快?

updates 2014/03/31 更新2014/03/31

It's because of the settings in Project Properties->Configuration Properties->C/C++->General->Debug Information Format , this value is set to P* rogram Database for Edit And Continue (/ZI) * as default and it is incompatible with a lot of optimizations according to msdn . 这是因为项目属性->配置属性-> C / C ++->常规->调试信息格式中的设置,此值默认设置为P * rogram数据库以进行编辑和继续(/ ZI) *,并且不兼容根据msdn进行了大量优化。 If this value is set to Program Database (/Zi) , the time cost of floor wouldn't be 10 times of (int). 如果将此值设置为“ 程序数据库(/ Zi)”floor的时间成本将不是(int)的10倍。

(I looked into Disassembly and found out that the length of codes ( call floor -> jmp floor ->different codes) are different when the setting is altered, that's the reason causes floor and .insert spent much more time than it should) (我调查了反汇编,发现更改设置后代码的长度( call floor -> jmp floor ->不同的代码)不同,这就是导致floor.insert花费比.insert更多的时间的原因)

As Gassa has pointed out, to optimize the tight loop use a custom floor function . 正如Gassa所指出的,要优化紧密循环,请使用自定义下限函数

set<int> isn't cache friendly, but to replace it with a cache-friendly structure you might need to alter the algorithm. set<int>不是缓存友好的,但是要用缓存友好的结构替换它,您可能需要更改算法。 Still, unordered_set<int> , with a decent space reserved to it, should be a bit better, having less cache misses per insert than a binary tree. 尽管如此, unordered_set<int>保留了一个不错的空间,应该会更好一些,与二叉树相比,每个插入的缓存未命中数更少。

PS Non-virtual overloads in C++ are resolved at compile time and have no effect on performance PS C ++中的非虚拟重载在编译时解决,对性能没有影响

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM