简体   繁体   English

如何使动态数组或向量以与标准数组类似的速度运行? C ++

[英]How can I make my dynamic array or vector operate at a similar speed to a standard array? C++

I'm still quite inexperienced in C++ and i'm trying to write sum code to add numbers precisely. 我对C ++仍然缺乏经验,我正在尝试编写求和代码以精确地加数字。 This is a dll plugin for some finite difference software and the code is called several million times during a run. 这是一些有限差分软件的dll插件,运行期间该代码被调用了数百万次。 I want to write a function where any number of arguments can be passed in and the sum will be returned. 我想编写一个函数,可以传入任何数量的参数,并且将返回总和。 My code looks like: 我的代码如下:

#include <cstdarg>

double SumFunction(int numArgs, ...){ // this allows me to pass any number 
                                      // of arguments to my function.
va_list args;
va_start(args,numArgs); //necessary prerequisites for using cstdarg

double myarray[10];
for (int i = 0; i < numArgs; i++) {
    myarray[i] = va_arg(args,double);
}       // I imagine this is sloppy code; however i cannot create
        // myarray{numArgs] because numArgs is not a const int.
sum(myarray); // The actual method of addition is not relevant here, but
              //for more complicated methods, I need to put the summation 
              // terms in a list.

vector<double> vec(numArgs); // instead, place all values in a vector
for (int i = 0; i < numArgs; i++) {
    vec.at(i) = va_arg(args,double);
}
sum(vec); //This would be passed by reference, of course. The function sum
          // doesn't actually exist, it would all be contained within the 
          // current function. This is method is twice as slow as placing 
          //all the values in the static array.

double *vec;
vec =  new double[numArgs];
for (int i = 0; i < (numArgs); i++) {
    vec[i] = va_arg(args,double);
}
sum(vec); // Again half of the speed of using a standard array and 
          // increasing in magnitude for every extra dynamic array!

delete[] vec;
va_end(args);
}

So the problem I have is that using an oversized static array is sloppy programming, but using either a vector or a dynamic array slows the program down considerably. 因此,我的问题是,使用超大静态数组会导致程序设计草率,但是使用向量或动态数组会大大降低程序速度。 So I really don't know what to do. 所以我真的不知道该怎么办。 Can anyone help, please? 有人可以帮忙吗?

When using a std::vector the optimizer must consider that relocation is possible and this introduces an extra indirection. 当使用std::vector ,优化器必须考虑到重定位是可能的,这会引入额外的间接寻址。

In other words the code for 换句话说,

v[index] += value;

where v is for example a std::vector<int> is expanded to 例如,其中vstd::vector<int>扩展为

int *p = v._begin + index;
*p += value;

ie from vector you need first to get the field _begin (that contains where the content starts in memory), then apply the index, and then dereference to get the value and mutate it. 也就是说,从向量开始,您首先需要获取字段_begin (其中包含内容在内存中的起始位置),然后应用索引,然后取消引用以获取值并将其变异。

If the code performing the computation on the elements of the vector in a loop calls any unknown non-inlined code, the optimizer is forced to assume that unknown code may mutate the _begin field of the vector and this will require doing the two-steps indirection for each element. 如果在循环中对向量的元素执行计算的代码调用了任何未知的非内联代码,则优化器将被迫假定未知代码可能会使向量的_begin字段发生突变,这将需要执行两步间接操作对于每个元素。

(NOTE: that the vector is passed with a cost std::vector<T>& reference is totally irrelevant: a const reference doesn't mean that the vector is const but simply puts a limitation on what operations are permitted using that reference ; external code could have a non- const reference to access the vector and const ness can also be legally casted away... const ness of references is basically ignored by the optimizer ). (注意:向量以cost std::vector<T>&引用传递是完全不相关的: const引用并不意味着向量是const而只是限制了使用该引用可以进行哪些操作;外部代码可以有一个非const引用来访问向量和const内斯也可以合法地铸造了...... const引用岬基本上是由优化器忽略 )。

One way to remove this extra lookup (if you know that the vector is not being resized during the computation) is to cache this address in a local and use that instead of the vector operator [] to access the element: 删除此额外查找的一种方法(如果您知道向量在计算过程中未调整大小),则将该地址缓存在本地,然后使用它代替向量运算符[]来访问元素:

int *p = &v[0];
for (int i=0,n=v.size(); i<n; i++) {
    /// use p[i] instead of v[i]
}

This will generate code that is almost as efficient as a static array because, given that the address of p is not published, nothing in the body of the loop can change it and the value p can be assumed constant (something that cannot be done for v._begin as the optimizer cannot know if someone else knows the address of _begin ). 这将生成几乎与静态数组一样高效的代码,因为假定p的地址未发布,则循环主体中没有任何内容可以更改它,并且p值可以假定为常量(这是无法完成的) v._begin因为优化程序无法知道其他人是否知道_begin的地址)。

I'm saying "almost" because a static array only requires indexing, while using a dynamically allocated area requires "base + indexing" access; 我说“几乎”是因为静态数组仅需要索引,而使用动态分配的区域则需要“基本+索引”访问; most CPUs however provide this kind of memory access at no extra cost. 但是,大多数CPU无需额外费用即可提供这种内存访问。 Moreover if you're processing elements in sequence the indexing addressing becomes just a sequential memory access but only if you can assume the start address constant (ie not in the case of std::vector<T>::operator[] ). 而且,如果您按顺序处理元素,则只有当您可以假定起始地址常量 (即,在std::vector<T>::operator[]情况下不是)时,索引寻址才变成顺序存储器访问。

One way to speed the code up (at the cost of making it more complicated) is to reuse a dynamic array or vector between calls, then you will avoid incurring the overhead of memory allocation and deallocation each time you call the function. 一种加快代码速度(以使代码更复杂为代价)的方法是在调用之间重用动态数组或向量,这样可以避免每次调用该函数时都产生内存分配和释放的开销。

For example declare these variables outside your function either as global variables or as member variables inside some class. 例如,将这些变量声明为函数外部的全局变量或某个类中的成员变量。 I'll just make them globals for ease of explanation: 为了便于说明,我将它们设为全局变量:

double* sumArray = NULL;
int sumArraySize = 0;

In your SumFunction, check if the array exists and if not allocate it, and resize if necessary: 在您的SumFunction中,检查数组是否存在以及是否不分配它,并在必要时调整大小:

double SumFunction(int numArgs, ...){ // this allows me to pass any number 
                                  // of arguments to my function.
    va_list args;
    va_start(args,numArgs); //necessary prerequisites for using cstdarg

    // if the array has already been allocated, check if it is large enough and delete if not:
    if((sumArray != NULL) && (numArgs > sumArraySize))
    {
        delete[] sumArray;
        sumArray = NULL;
    }

    // allocate the array, but only if necessary:
    if(sumArray == NULL)
    {
        sumArray = new double[numArgs];
        sumArraySize = numArgs;
    }

    double *vec = sumArray;   // set to your array, reusable between calls
    for (int i = 0; i < (numArgs); i++) {
        vec[i] = va_arg(args,double);
    }
    sum(vec, numArgs); // you will need to pass the array size

    va_end(args);

    // note no array deallocation
}

The catch is that you need to remember to deallocate the array at some point by calling a function similar to this (like I said, you pay for speed with extra complexity): 要注意的是,您需要记住在某个时候通过调用类似于此的函数来取消分配数组(就像我说的那样,您为速度付出了额外的复杂性):

void freeSumArray()
{
    if(sumArray != NULL)
    {
        delete[] sumArray;
        sumArray = NULL;
        sumArraySize = 0;
    }
}

You can take a similar (and simpler/cleaner) approach with a vector, allocate it the first time if it doesn't already exist, or call resize() on it with numArgs if it does. 您可以对向量采用类似(更简单/更简洁)的方法,如果它尚不存在,则第一次分配它,或者如果它不存在,则使用numArgs对其调用resize()。

Assuming that the "max storage ever needed" is in the order of 10-50, I'd say using a local array is perfectly fine. 假设“所需的最大存储量”在10到50的数量级,那么我说使用本地数组就可以了。

Using vector<T> will use 3 * sizeof(*T) (at least) to track the contents of the vector. 使用vector<T>将至少使用3 * sizeof(*T) )来跟踪向量的内容。 So if we compare that to an array of double arr[10]; 因此,如果我们将其与double arr[10];的数组进行比较double arr[10]; , then that's 7 elements more on the stack of equal size (or 8.5 in 32-bit build). ,那么相等大小的堆栈(或32位版本中的8.5)上要多7个元素。 But you also need a call to new , which takes a size argument. 但是,您还需要调用new ,它需要一个size参数。 So that takes up AT LEAST one, more likely 2-3 elements of stackspace, and the implementation of new is quite possibly not straightforward, so further calls are needed, which take up further stack-space. 因此,这至少占用了一个堆栈空间,最多可能是2-3个元素,而new的实现很可能不是那么简单,因此需要进行进一步的调用,从而占用更多的堆栈空间。

If you "don't know" the number of elements, and need to cope with quite large numbers of elements, then using a hybrid solution, where you have a small stack-based local array, and if numargs > small_size use vector, and then pass vec.data() to the function sum . 如果您“不知道”元素的数量,并且需要处理大量元素,则使用混合解决方案,其中您有一个基于堆栈的小型本地数组,并且如果numargs > small_size使用向量,并且然后将vec.data()传递给函数sum

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM