使用 STL 算法缓存复杂的比较函数？

Question

I have a long to compute function f(T) -> int, and a vector v of T. The task is to find the minimum element after applying f.我有很长的时间来计算 function f(T) -> int，以及 T 的向量 v。任务是在应用 f 后找到最小元素。

Generally, I would use一般来说，我会使用

std::min_element(begin(v), end(v), [&f](auto a, auto b){ return f(a) < f(b); }

Do compilers try to store the computed values (if that makes sense) or do I have to do that by hand?编译器是否尝试存储计算值（如果有意义的话）还是我必须手动存储？ In the second case: Is there a good solution using STL algorithms to do that by hand?在第二种情况下：是否有使用 STL 算法手动完成的好的解决方案？

EDIT: Note that the implementation cannot ensure this optimization as it only gets the comparison function and not the structure of that (in this case a unary function).编辑：请注意，实现不能确保这种优化，因为它只得到比较 function 而不是它的结构（在这种情况下是一元函数）。

This is also a more general question because this arises all the time when using STL algorithms.这也是一个更普遍的问题，因为在使用 STL 算法时会一直出现这种情况。 If I measure one example, I only get the insurance for one specific example.如果我测量一个示例，我只会为一个特定示例获得保险。 I am interested whether the compiler tries to fix this in general (with reasonable optimization enabled).我很感兴趣编译器是否试图解决这个问题（启用了合理的优化）。 And if not, can you fix this without rewriting the algorithm?如果没有，你能在不重写算法的情况下解决这个问题吗？

EDIT 2: I think that all questions are well-answered except for the replacement method.编辑 2：我认为除了替换方法之外，所有问题都得到了很好的回答。 This should satisfy that it has the same running time as a (not implemented) function这应该满足它与（未实现）function 具有相同的运行时间

std::min_element(begin(v), end(v), f);

that stores the value of the last accessed element.存储最后访问元素的值。 Furthermore I would like a solution that is applicable to all algorithms where this optimization can be made.此外，我想要一个适用于所有可以进行这种优化的算法的解决方案。

With c++20, we get the possibility to use projections, but as far as I see, the suggested implementation https://en.cppreference.com/w/cpp/algorithm/ranges/min_element is not optimized for caching (I wonder why, it would not make anything slower, right?).使用 c++20，我们可以使用投影，但据我所知，建议的实现https://en.cppreference.com/w/cpp/algorithm/ranges/min_element没有针对缓存进行优化（我想知道为什么，它不会让任何事情变慢，对吧？）。

Answer 1

Do compilers try to store the computed values (if that makes sense) or do I have to do that by hand?编译器是否尝试存储计算值（如果有意义的话）还是我必须手动存储？

In general this is called memoization and in general the compiler can't do it for you.通常，这称为记忆化，通常编译器无法为您完成。 In specific cases, inlining might allow the optimizer to do something clever.在特定情况下，内联可能允许优化器做一些聪明的事情。

You can write an automatic memoizing function wrapper if you're expecting to do this a lot - it just needs to keep some (maybe bounded) amount of storage tracking the outputs for previously-encountered inputs.如果您希望经常这样做，您可以编写一个自动记忆 function 包装器 - 它只需要保留一些（可能是有限的）存储量来跟踪先前遇到的输入的输出。 Then you could write something like然后你可以写类似

auto mf = memoize(f);
std::min_element(begin(v), end(v), [&mf](auto a, auto b){ return mf(a) < mf(b); }

(note that you're still making an explicit decision about how long this particular memo lasts, assuming its cached values are lost when mf goes out of scope). （请注意，假设当mf超出范围时，它的缓存值会丢失，您仍在明确决定此特定备忘录的持续时间）。

The language won't do it for you because there are lots of implementation tradeoffs ( how much storage is reasonable? what happens if the return value's copy constructor throws an exception when copying it into your memo storage? ) that don't have any obvious good default.该语言不会为您做这件事，因为有很多实现权衡（多少存储是合理的？如果返回值的复制构造函数在将其复制到您的备忘录存储时抛出异常会发生什么？ ）没有任何明显的好的默认。

Answer 2

In order to check that the compiler was not able to cache the calculated values, I implemented and benchmarked three possibilities:为了检查编译器是否无法缓存计算值，我实现了三种可能性并进行了基准测试：

The direct min implemention直接最小实现
Min calculation after first calculating the array of f(.) values首次计算f(.)值数组后的最小值计算
A specific min_element function, with an unitary function as an argument一个特定的 min_element function，一个单一的 function 作为参数

Output: Output：

min = 9193
distance = 120
822913 micro-s

New version:
min = 9193
distance = 120
425393 micro-s

With rewriting of min_element function:
min = 9193
distance = 120
416941 micro-s

The second version effectively brings a real speed improvement.第二个版本有效地带来了真正的速度提升。

The third version brings a small speed improvement only, but has the advantage to avoid an increase of the memory used.第三个版本只带来了小的速度提升，但优点是避免增加使用的 memory。

#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>

const int param = 10000;
const int N = 10000;

int slowF (int x) {
    int y = x;
    for (int i = 0; i < param; ++i) {
        y = y*y % N;
    }
    return y+2;
}


template<typename T, class ForwardIt, class Funct>
ForwardIt min_element_fct(ForwardIt first, ForwardIt last, Funct f)
{
    if (first == last) return last;
 
    ForwardIt smallest = first;
    T val_smallest = f(*first);
    ++first;
    for (; first != last; ++first) {
        T val;
        if ((val=f(*first)) < val_smallest) {
            smallest = first;
            val_smallest = val;
        }
    }
    return smallest;
}

int main() {
    std::vector<int> v(N);
    v[0] = N/2 - 7;
    for (int i = 1; i < N; ++i) v[i] = (13*v[i-1] + 27) % N;
    
    auto comp = [] (int a, int b) {return slowF(a) < slowF(b);};
    auto t1 = std::chrono::high_resolution_clock::now();
    auto adr_min = std::min_element (v.begin(), v.end(), comp);
    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "min = " << *adr_min << std::endl;
    std::cout << "distance = " << std::distance (v.begin(), adr_min) << std::endl;

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
    std::cout << duration << " micro-s" << std::endl;
    std::cout << "\nNew version: \n";
        
    v[0] = N/2 - 7;
    for (int i = 1; i < N; ++i) v[i] = (13*v[i-1] + 27) % N;
    
    t1 = std::chrono::high_resolution_clock::now();
    std::vector<int> val(N);
    for (int i = 0; i < N; ++i) val[i] = slowF(v[i]);
    
    auto adr_min2 = std::min_element (val.begin(), val.end());
    adr_min = v.begin() + std::distance (val.begin(), adr_min2);
    t2 = std::chrono::high_resolution_clock::now();
    std::cout << "min = " << *adr_min << std::endl;
    std::cout << "distance = " << std::distance (val.begin(), adr_min2) << std::endl;

    duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
    std::cout << duration << " micro-s" << std::endl;   
    
    std::cout << "\nWith rewriting of min_element function: \n";
    
    v[0] = N/2 - 7;
    for (int i = 1; i < N; ++i) v[i] = (13*v[i-1] + 27) % N;
    
    t1 = std::chrono::high_resolution_clock::now();
    
    adr_min = min_element_fct<int> (v.begin(), v.end(), slowF);
    t2 = std::chrono::high_resolution_clock::now();
    std::cout << "min = " << *adr_min << std::endl;
    std::cout << "distance = " << std::distance (v.begin(), adr_min) << std::endl;

    duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
    std::cout << duration << " micro-s" << std::endl;   

    return 0;
}

Answer 3

The implementation of min_element can't cache f(a) , as it works on a predicate. min_element的实现不能缓存f(a) ，因为它适用于谓词。 But looking at the larger implementation (compiler plus its standard library), because you pass a lambda the compiler will likely be able to inline the lambda body into min_element .但是看看更大的实现（编译器加上它的标准库），因为你传递了 lambda ，编译器很可能能够将 lambda 主体内联到min_element 。 As a result, the optimizer does get a chance to look at that f(a) inside a loop.结果，优化器确实有机会查看循环内的f(a) 。

A good optimizer might just cache f(*returnvalue) for simple iterator types such as pointers and vector iterators.一个好的优化器可能只为简单的迭代器类型（例如指针和向量迭代器）缓存f(*returnvalue) 。 That would not be specially programmed for min_element , it would be a result of a generic optimization that tries to reuse subexpressions inside loops.这不会是专门为min_element编程的，它将是尝试在循环内重用子表达式的通用优化的结果。

Answer 4

A replacement method is not that easy to do in general without adding some runtime overhead that you would not have for single hand-crafted solutions for min_element , max_element etc. The problem is that for memoizing you always have to know what value you should memoize.如果不增加一些运行时开销，对于min_element 、 max_element等的单一手工解决方案来说，替换方法通常并不容易。问题是，为了记忆，你总是必须知道你应该记忆什么值。 "The last accessed" does not really make sense in this context since you will always access two elements for a compare. “最后一次访问”在这种情况下实际上没有意义，因为您将始终访问两个元素进行比较。

So if your memoization only works on the projection (agnostic with regards to how it is used in a comparison) it would need two slots for cached return values, two slots for saving the arguments that create these return values and an indication for which should be overridden next.因此，如果您的记忆仅适用于投影（不知道它在比较中的使用方式），则需要两个用于缓存返回值的插槽，两个用于保存创建这些返回值的 arguments 的插槽和一个指示应该是接下来被覆盖。

And even that may not be enough.甚至这可能还不够。 I am not sure in which way there sequence points in calls to < , but I think that there aren't any in a call to a comp(_,_) function.我不确定在对<的调用中以哪种方式顺序指向，但我认为在对comp(_,_) function 的调用中没有任何序列点。 So comparing a, b, c can use comp(f(a), f(b)) and comp(f(c), f(a)) and may execute that in order f(a) , f(b) , f(c) , f(a) .因此比较a, b, c可以使用comp(f(a), f(b))和comp(f(c), f(a))并且可以按顺序执行f(a) , f(b) , f(c) , f(a) 。 Here, memoizing with two slots would not be enough.在这里，使用两个插槽进行记忆是不够的。

Having established that we cannot really be agnostic with regards to the comparison function, how about we try it with access to it?确定我们不能真正对比较 function 不可知论，我们如何尝试访问它？ Well, then we have another problem.好吧，那么我们还有另一个问题。 Take this simple implementation of a memoizer.以这个简单的记忆器实现为例。 std::min_element always packs the current minimum in the right hand side of the comparison and it stays there if the result is false ( standard quote ). std::min_element始终将当前最小值打包在比较的右侧，如果结果为假（标准报价），它会保留在那里。

#include <type_traits>
#include <optional>
#include <utility>
#include <algorithm>
#include <iostream>

template<class ArgType, class Func, class Comp>
struct Memoizer {
    Memoizer(Func func, Comp comp) : m_func(std::move(func)), m_comp(std::move(comp)) {}

    [[nodiscard]] constexpr bool operator()(const ArgType &lhs, const ArgType &rhs) {
        if (!m_cache) {
            m_cache = std::invoke(m_func, rhs);
        }

        auto temp = std::invoke(m_func, lhs);

        if (m_comp(temp, *m_cache)) {
            m_cache = std::move(temp);
            return true;
        }
        return false;
    }

    using result_t = std::invoke_result_t<Func, ArgType>;
    Func m_func;
    Comp m_comp;

  private:
    std::optional<result_t> m_cache;
};

template<class ArgType, class Func, class Comp>
auto make_memoizer(Func&& f, Comp&& c) -> Memoizer<ArgType, Func, Comp> {
    return {std::forward<Func>(f), std::forward<Comp>(c)};
}

int main() {
    int arr[] = {2,3,1,5,4};

    auto memo = make_memoizer<int>(std::negate{}, std::less{});

    auto min = std::min_element(std::begin(arr), std::end(arr), memo);
    auto max = std::max_element(std::begin(arr), std::end(arr), memo);

    std::cout << "Min: " << -*min << " (Should be: " << -*std::max_element(std::begin(arr), std::end(arr)) << ")\n";
    std::cout << "Max: " << -*max << " (Should be: " << -*std::min_element(std::begin(arr), std::end(arr)) << ")\n";
}

As you can see, max_element does it the other way round and hence we obtain a false result.如您所见， max_element ，因此我们得到了错误的结果。 Now you can template the whole thing again on which side to memoize in which case, but that is just a nicely written invitation for bugs because you will use the wrong variant for the wrong algorithm.现在，您可以在哪种情况下再次将整个事情模板化，在哪一边记忆，但这只是一个写得很好的错误邀请，因为您将为错误的算法使用错误的变体。 Further that just won't work for minmax_element .此外，这对minmax_element 。 And at that point writing seperate implementations for min_element and max_element is just easier and safer.那时为min_element和max_element编写单独的实现更加容易和安全。

There is still the option to write an iterator adaptor that calls the function on the first *iter and memoizes it for further *iter calls.仍然可以选择编写一个迭代器适配器，该适配器在第一个*iter上调用 function 并将其记忆以供进一步*iter调用。 But I does not seem that the standard gives any guarantees to how the iterators are called so I am not exactly sure if we can guarantee that the function calls are as small as they can be.但我似乎没有为迭代器的调用方式提供任何保证，所以我不确定我们是否可以保证 function 调用尽可能小。

This is a very bare bones implementation of such an memoizing iter.这是这种记忆迭代器的一个非常简单的实现。 You should probably use boost or a similar library to write iterator adaptors cause otherwise they are a pain in the neck with everything they have to foward.您可能应该使用 boost 或类似的库来编写迭代器适配器，否则它们会让人头疼。

#include <type_traits>
#include <optional>
#include <utility>
#include <algorithm>
#include <iostream>
#include <iterator>

template<class Func, class Iter>
struct MemoIter {
    using result_t = std::invoke_result_t<Func, decltype(*std::declval<Iter>())>;

    using iterator_category = typename std::iterator_traits<Iter>::iterator_category;

    auto operator*() {
        if(!m_cache) {
            m_cache = m_func(*m_iter);
        }
        return *m_cache;
    }

    auto operator++() {
        invalidate_cache();
        ++m_iter;
        return *this;
    }

    auto operator++(int) {
        auto ret = MemoIter(m_iter, invalidate_cache(), m_func);
        ++m_iter;
        return ret;
    }

    MemoIter(Iter iter, Func func) : m_iter(iter), m_func(func) {}
    explicit MemoIter(Iter iter) : MemoIter(iter, {}) {}

    friend bool operator==(const MemoIter& lhs, const MemoIter& rhs) {
        return lhs.m_iter == rhs.m_iter;
    }


    friend bool operator!=(const MemoIter& lhs, const MemoIter& rhs) {
        return lhs.m_iter != rhs.m_iter;
    }


private:
    auto invalidate_cache() {
        auto ret = std::move(m_cache);
        m_cache.reset();
        return ret;
    }
    MemoIter(Iter iter, std::optional<result_t>&& cache, Func func) : m_iter(iter), m_cache(std::move(cache)), m_func(func) {}
    Iter m_iter;
    std::optional<result_t> m_cache;
    Func m_func;

};

int main() {
    int arr[] = {2,3,1,5,4};
    auto begin = MemoIter(std::begin(arr), std::negate<>{});
    auto end = MemoIter(std::end(arr), std::negate<>{});

    auto min = std::min_element(begin, end);
    auto max = std::max_element(begin, end);

    std::cout << "Min: " << *min << " (Should be: " << -*std::max_element(std::begin(arr), std::end(arr)) << ")\n";
    std::cout << "Max: " << *max << " (Should be: " << -*std::min_element(std::begin(arr), std::end(arr)) << ")\n";
}

ADDENDUM: Concerning C++20.附录：关于 C++20。 The standard explicitly specifies that the number of projection operations is exactly twice the number of comparisons, so the not only the example implementation but also the standard does not give the possibility of caching for this.该标准明确规定投影操作的数量恰好是比较数量的两倍，因此不仅示例实现而且标准都没有为此提供缓存的可能性。 (Of course, if the compiler could prove that there is no side effect to the projection, it could cache due to the as-if-rule, but that is a big if). （当然，如果编译器可以证明投影没有副作用，那么由于 as-if 规则，它可以缓存，但这是一个很大的 if）。 As to the why: As you see, memoization is not that easy.至于原因：如您所见，记忆并不那么容易。 You have still further problems if the iterator returns some kind of proxy that may be invalidated in between comparisons.如果迭代器返回某种可能在比较之间无效的代理，您还有更多问题。 So, just calling the projection more often is more general, easier and also more predictable (you know the number of calls to your projection to be 2*range.size() - 1 ; if you memoize, it can depend on the order of the elements in your range and whatnot).因此，只是更频繁地调用投影更通用、更容易且更可预测（您知道对投影的调用次数为2*range.size() - 1 ；如果您记忆，它可能取决于你范围内的元素等等）。

使用 STL 算法缓存复杂的比较函数？

问题描述

4 个解决方案

解决方案1
2 2021-01-05 14:11:59

解决方案2
2 2021-01-05 14:35:20

解决方案3
1 2021-01-05 14:03:12

解决方案4
1 已采纳 2021-01-06 13:54:15

使用 STL 算法缓存复杂的比较函数？

问题描述

4 个解决方案

解决方案1 2 2021-01-05 14:11:59

解决方案2 2 2021-01-05 14:35:20

解决方案3 1 2021-01-05 14:03:12

解决方案4 1 已采纳 2021-01-06 13:54:15

解决方案1
2 2021-01-05 14:11:59

解决方案2
2 2021-01-05 14:35:20

解决方案3
1 2021-01-05 14:03:12

解决方案4
1 已采纳 2021-01-06 13:54:15