简体   繁体   English

针对编译时常量优化的函数

[英]Function optimized for compile-time constant

Suppose that I have a vector length calculation function, which has an additional inc parameter (this tells the distance between neighboring elements).假设我有一个向量长度计算函数,它有一个额外的inc参数(它告诉相邻元素之间的距离)。 A simple implementation would be:一个简单的实现是:

float calcLength(const float *v, int size, int inc) {
    float l = 0;

    for (int i=0; i<size*inc; i += inc) {
        l += v[i]*v[i];
    }
    return sqrt(l);
}

Now, calcLength can be called with two kind of inc parameters: when inc is known at compile-time, and when it is not.现在,可以使用两种inc参数调用calcLength :在编译时知道inc时和不知道时。 I'd like to have an optimized calcLength version for common compile-time values of inc (like 1).我想要一个优化的calcLength版本,用于inc常见编译calcLength (如 1)。

So, I'd have something like this:所以,我会有这样的事情:

template <int C>
struct Constant {
    static constexpr int value() {
        return C;
    }
};

struct Var {
    int v;

    constexpr Var(int p_v) : v(p_v) { }

    constexpr int value() const {
        return v;
    }
};

template <typename INC>
float calcLength(const float *v, int size, INC inc) {
        float l = 0;

        for (int i=0; i<size*inc.value(); i += inc.value()) {
            l += v[i]*v[i];
        }
        return sqrt(l);
    }
}

So, this can be used:所以,这可以使用:

calcLength(v, size, Constant<1>()); // inc is a compile-time constant 1 here, calcLength can be vectorized

or或者

int inc = <some_value>;
calcLength(v, size, Var(inc)); // inc is a non-compile-time constant here, less possibilities of compiler optimization

My question is, would it be possible somehow to keep the original interface, and put Constant / Var in automatically, depending on the type (compile-time constant or not) of inc ?我的问题是,是否有可能以某种方式保留原始接口,并根据inc的类型(编译时常量与否)自动放入Constant / Var

calcLength(v, size, 1); // this should end up calcLength(v, size, Constant<1>());
calcLength(v, size, inc); // this should end up calcLength(v, size, Var(int));

Note: this is a simple example.注意:这是一个简单的例子。 In my actual problem, I have several functions like calcLength , and they are large, I don't want the compiler to inline them.在我的实际问题中,我有几个函数,比如calcLength ,而且它们很大,我不希望编译器内联它们。


Note2: I'm open to different approaches as well.注2:我也对不同的方法持开放态度。 Basically, I'd like to have a solution, which fulfills these:基本上,我想有一个解决方案,它满足这些:

  • the algorithm is specified once (most likely in a template function)算法被指定一次(很可能在模板函数中)
  • if I specify 1 as inc , a special function instantiated, and the code most likely gets vectorized如果我将1指定为inc ,则实例化一个特殊函数,并且代码很可能会被矢量化
  • if inc is not a compile-time constant, a general function is called如果inc不是编译时常量,则调用通用函数
  • otherwise (non-1 compile-time constant): doesn't matter which function is called否则(非 1 编译时常量):调用哪个函数无关紧要

If the goal here is simply to optimize, rather than enable use in a compile-time context, you can give the compiler hints about your intent:如果这里的目标只是优化,而不是在编译时上下文中启用使用,您可以向编译器提供有关您的意图的提示:

static float calcLength_inner(const float *v, int size, int inc) {
    float l = 0;

    for (int i=0; i<size*inc; i += inc) {
        l += v[i]*v[i];
    }
    return sqrt(l);
}

float calcLength(const float *v, int size, int inc) {
    if (inc == 1) {
        return calcLength_inner(v, size, inc);  // compiler knows inc == 1 here, and will optimize
    }
    else {
        return calcLength_inner(v, size, inc);
    }
}

From godbolt , you can see that calcLength_inner has been instantiated twice, both with and without the constant propagation.从 Godbolt ,你可以看到calcLength_inner已经被实例化了两次,有和没有常量传播。

This is a C trick (and is used extensively inside numpy), but you can write a simple wrapper to make it easier to use in c++:这是一个 C 技巧(并且在 numpy 中广泛使用),但是您可以编写一个简单的包装器以使其更易于在 C++ 中使用:

// give the compiler a hint that it can optimize `f` with knowledge of `cond`
template<typename Func>
auto optimize_for(bool cond, Func&& f) {
    if (cond) {
        return std::forward<Func>(f)();
    }
    else {
        return std::forward<Func>(f)();
    }
}

float calcLength(const float *v, int size, int inc) {
    return optimize_for(inc == 1, [&]{
        float l = 0;
        for (int i=0; i<size*inc; i += inc) {
            l += v[i]*v[i];
        }
        return sqrt(l);
    });
}

C++ does not provide a way to detect whether a supplied function parameter is a constant expression or not, so you cannot automatically differentiate between supplied literals and runtime values. C ++没有提供检测提供的函数参数是否是常量表达式的方法,因此您无法自动区分提供的文字和运行时值。

If the parameter must be a function parameter, and you're not willing to change the way it is called in the two cases, then the only lever you have here is the type of the parameter: your suggestions for Constant<1>() vs Var(inc) are pretty good in that regard. 如果参数必须是一个函数参数,并且你不愿意改变它在两种情况下调用的方式,那么你在这里唯一的杠杆就是参数的类型:你对Constant<1>()建议Constant<1>() vs Var(inc)在这方面相当不错。

Option 1: Trust you compiler (aka do nothing)选项 1:相信你的编译器(也就是什么都不做)

Can compilers can do what you want without you lifting a finger (well, you need to enable optimized builds, but that goes without saying).编译器可以做你想做的事吗?

Compilers can create what are called "function clones", which do what you want.编译器可以创建所谓的“函数克隆”,它可以执行您想要的操作。 A clone function is a copy of a function used for constant propagation, aka the resulting assembly of a function called with constant arguments.克隆函数是用于常量传播的函数的副本,也就是使用常量参数调用的函数的结果程序集。 I found little documentation about this feature, so it's up to you if you want to rely on it.我发现关于此功能的文档很少,因此是否要依赖它取决于您。

The compiler can inline this function altogether, potentially making your problem a non-problem (you can help it by defining it inline in a header, using lto and/or using compiler specific attributes like __attribute__((always_inline)) )编译器可以完全内联此函数,这可能会使您的问题成为非问题(您可以通过在标头中内联定义它、使用 lto 和/或使用编译器特定属性(如__attribute__((always_inline)) )来帮助它)

Now, I am not preaching to let the compiler do its job.现在,我不是在鼓吹让编译器完成它的工作。 Although the compiler optimizations are amazing these times and the rule of thumb is to trust the optimizer, there are situation where you need to manually intervene.尽管这些时候编译器的优化非常出色,并且经验法则是信任优化器,但在某些情况下您需要手动干预。 I am just saying to be aware of and to take it into consideration.我只是说要意识到并考虑到它。 Oh, and as always measure, measure, measure when it comes to performance, don't use your "I feel here I need to optimize" gut.哦,和往常一样衡量,衡量,衡量性能时,不要使用“我觉得我需要优化”的直觉。

Option 2: Two overloads选项 2:两个重载

float calcLength(const float *v, int size, int inc) {
    float l = 0;

    for (int i=0; i<size*inc; i += inc) {
        l += v[i]*v[i];
    }
    return sqrt(l);
}

template <int Inc>
float calcLength(const float *v, int size) {
    float l = 0;

    for (int i=0; i<size*inc; i += inc) {
        l += v[i]*v[i];
    }
    return sqrt(l);
}

The disadvantage here is code duplication, ofc.这里的缺点是代码重复,ofc。 Also little care need to be taken at the call site:在调用站点也很少需要注意:

calcLength(v, size, inc); // ok
calcLength<1>(v, size);   // ok
calcLength(v, size, 1);   // nope

Option 3: Your version选项 3:您的版本

Your version is ok.你的版本没问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM