简体   繁体   English

我将如何优化此代码?

[英]How would I go about optimizing this code?

I am writing a function to find the average of an array in which the array is mostly numbers that would overflow if added all at once. 我正在编写一个函数来查找数组的平均值,其中数组中的大多数是如果一次添加全部就会溢出的数字。

It works by creating a subarray( b in my code) that is half the input( a in my code) array's size( ar_size in my code) and then places the average of 2 values from the input array a[i+0] and a[i+1] with no overlap into b[j] . 它的工作原理通过创建(一个子阵列b在我的代码),其一半的输入( a在我的代码)阵列的大小( ar_size在我的代码),然后放置2个值的平均值从输入阵列a[i+0] and a[i+1]不与b[j]重叠。

Once it iterates through the entire input array, it reruns the function with returning the subarray and the input array size until the size equals 2 and then ends the recursion by returning the average of the two values of b[2] . 一旦遍历整个输入数组,它将通过返回子数组和输入数组的大小来重新运行该函数,直到大小等于2,然后通过返回b[2]两个值的平均值来结束递归。

Please pardon the reuse of j . 请原谅j的重用。

Also the size of the array is some power of two. 数组的大小也是2的幂。

uint64_t* array_average(uint64_t* a, const int ar_size)
{
    uint64_t* b = new uint64_t[ar_size / 2];

    uint64_t* j = new uint64_t;

    if (ar_size == 2)
    {
     *j = (a[0] / 2) + (a[1] / 2) + ((a[0] % 2 + a[1] % 2) / 2);

     return j;
    }

    for (int i = 0; i < ar_size; i += 2)
    {
        b[*j] = (a[i + 0] / 2) + (a[i + 1] / 2) + ((a[i + 0] % 2 + a[i + 1] % 2) / 2);

        ++*j;
    }
    delete j;
    return array_average(b, ar_size / 2);
}

Also anyone have a better way to average while working with numbers that would cause an overflow to happen? 还有人在处理可能导致溢出的数字时有更好的平均方法吗?

Here is a revised version: 这是修订版:

uint64_t* tools::array_average(uint64_t* a, const int ar_size)
{
    uint64_t* b = new uint64_t[ar_size];
    uint64_t* c = new uint64_t[ar_size / 2];

    int j;
    j = 0;

    for (int i = 0; i < ar_size; ++i)
    {
        b[i] = a[i];
    }

    if (runs > 0) //This is so i do not delete the original input array I.E not done with it
    {
        delete[] a;
    }

    if (ar_size == 2)
    {
        uint64_t* y = new uint64_t;

        runs = 0;

        *y = (b[0] / 2) + (b[1] / 2) + ((b[0] % 2 + b[1] % 2) / 2); 

        delete[] b;

        return y;
    }

    for (int i = 0; i < ar_size; i += 2)
    {
        c[j] = (b[i + 0] / 2) + (b[i + 1] / 2) + ((b[i + 0] % 2 + b[i + 1] % 2) / 2);

        ++j;
    }

    delete[] b;

    ++runs;

    return array_average(c, ar_size / 2);

First of all, be aware that your average is not the actual average, as you do throw away one halfs. 首先,请注意您的平均值不是实际平均值,因为您确实丢掉了一半。 The result of your algorithm on an array that alternates between 0 and 1 would be 0, as 0/2 + 1/2 + (0%2 + 1%2)/2 = 0. Wanted to start with that, because that is a serious weakness of your algorithm. 您的算法在一个介于0和1之间交替的数组上的结果将为0,因为0/2 + 1/2 +(0%2 + 1%2)/ 2 =0。想以此开始,因为那是您算法的一个严重弱点。

Also note that if the original size is not a power of 2, some data will get a higher weight. 另请注意,如果原始大小不是2的幂,则某些数据的权重会更高。

Aside from that, consider this algorithm: Copy the data. 除此之外,请考虑以下算法:复制数据。 Until the data has only one entry left, put the average of cells 0 and 1 in cell 0, that of 2 and 3 in cell 1, 4 and 5 in 2 and so on. 直到数据只剩下一个条目,然后将单元格0和1的平均值放入单元格0,将单元格2和3的平均值放入单元格1、4和5中的2,依此类推。 Shrink the data after each such step. 在每个这样的步骤之后收缩数据。

As code: 作为代码:

uint64_t average(std::vector<uint64_t> data)
{
    while(data.size() != 1)
    {
        for(size_t i=0; i<data.size()/2; i++)
        {
            data[i] = data[2*i]/2 + data[2*i+1]/2 + /* modular stuff */;
        }
        data.resize(data.size()/2 + data.size()%2); //last part is required if the size is not an even number
    }
    return data[0];
}

Using a proper container here also gets rid of your memory leak, by the way. 顺便说一句,在这里使用适当的容器也可以避免内存泄漏。

Note that this code still has the weakness I talked about. 请注意,此代码仍然具有我所谈到的弱点。 You could extent it by collecting the halves, that is if your modular part is 1, you increase a variable, and when the variable is at two, you add a one in some cell. 您可以通过收集一半来扩展它,即,如果模块化部分为1,则增加一个变量,而当变量为2时,则在某个单元格中添加一个。

Edit: If the input HAS to be a raw array (because you receive it from some external source, for example), use this: 编辑:如果输入HAS是原始数组(例如,因为您是从某些外部来源收到的),请使用以下命令:

uint64_t average(uint64_t* array, const int array_size)
{
    std::vector<uint64_t> data(array, array + array_size);

    (rest of the code is identical)

Edit: code above with collecting halves: 编辑:上面收集一半的代码:

inline uint64_t average(const uint64_t& a, const uint64_t& b, uint8_t& left_halves)
{
    uint64_t value = a/2 + b/2 + (a%2 + b%2)/2;
    if((a%2 + b%2)%2 == 1)
    {
        left_halves += 1;
    }
    if(left_halves == 2)
    {
        value += 1;
        left_halves = 0;
    }
    return value;
}

uint64_t average(std::vector<uint64_t> data)
{
    if(data.size() == 0) return 0;

    uint8_t left_halves = 0;
    while(data.size() != 1)
    {
        for(size_t i=0; i<data.size()/2; i++)
        {
            data[i] = average(data[2*i], data[2*i+1], left_halves);
        }
        data.resize(data.size()/2 + data.size()%2); //last part is required if the size is not an even number
    }
    return data[0];
}

Still has the weakness of increased cell weight if size is not a power of two. 如果大小不是2的幂,则仍然具有增加细胞重量的缺点。

You might use: 您可以使用:

constexpr bool is_power_of_2(uint64_t n)
{
    return n && !(n & (n - 1));
}

uint64_t array_average(std::vector<uint64_t> v)
{
    if (!is_power_of_2(v.size())) {
        throw std::runtime_error("invalid size");
    }
    uint64_t remainder = 0;
    while (v.size() != 1) {
        for (int i = 0; i != v.size(); i += 2) {
            remainder += (a[i] % 2 + a[i + 1] % 2);
            b[i / 2] = a[i] / 2 + a[i + 1] / 2;
            if (remainder >= 2 && b[i / 2] < -(remainder / 2)) {
                b[i / 2] += remainder / 2;
                remainder %= 2;
            }
        }
        v.resize(v.size() / 2);
    }
    return v[0] + remainder / 2;
}

There really shouldn't be that much to convert as there are containers, functions and algorithms in the stl that already exist that will do this for you. 实际上,应该没有太多要转换的内容,因为stl中已经存在可以为您完成此操作的容器,函数和算法。 With out any function examine this short program: 没有任何功能,请检查以下简短程序:

#include <vector>
#include <numeric>
#include <iostream>
#include <exception>

int main() {
    try {

        std::vector<uint64_t> values{ 1,2,3,4,5,6,7,8,9,10,11,12 };
        int total = std::accumulate( values.begin(), values.end(), 0 );
        uint64_t average = static_cast<uint64_t>( total ) / values.size();
        std::cout << average << '\n';

    } catch( const std::runtime_error& e ) {
        std::cerr << e.what() << '\n';
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

On my machine windows 7 ultimate 64bit running visual studio 2017 CE compiled with language version set to most recent c++17 or greater. 在我的机器上,运行visual studio 2017 CE windows 7 ultimate 64bit运行的语言版本设置为最新的c++17或更高版本。 This does give me a compiler warning! 这确实给了我一个编译器警告! Warning: C4244 generated due to conversion and possible loss of data. Warning: C4244由于转换和可能的数据丢失而生成了Warning: C4244 However there are no compiler errors and it does run and give the expected result. 但是,没有编译器错误,它确实可以运行并给出预期的结果。 The output here is 6 as expected since integer division is truncated. 由于integer division被截断,此处的输出为预期的6 If I change these lines of code above to this: 如果我将以上这些代码行更改为此:

double total = std::accumulate( values.begin(), values.end(),
                            static_cast<double>( 0 ) );
double average = total / values.size();

It fixes the compiler warnings above by adding the static_cast and it sure enough prints out 6.5 which is the actual value. 它通过添加static_cast修复上面的编译器警告,并确保足够打印出6.5这是实际值)。

This is all fine and good since the vector is already initialized with values; 一切都很好,因为向量已经用值初始化了。 however, this may not be always the case so let's move this into a function that will take an arbitrary array. 但是,可能并非总是如此,因此让我们将其移入将采用任意数组的函数中。 It would look something like this: 它看起来像这样:

uint64_t array_average( std::vector<uint64_t>& values ) {
    // Prevent Division by 0 and early return 
    // as to not call `std::accumulate`
    if ( !values.empty() ) {
        // check if only 1 entry if so just return it
        if ( values.size() == 1 ) {
            return values[0];
        } else { // otherwise do the calculation.
            return std::accumulate( values.begin(), values.end(),
                                    static_cast<uint64_t>( 0 ) ) / values.size();
        } 
    } 
    // Empty Container 
    throw std::runtime_error( "Can not take average of an empty container" );
}

This function is nice and all, we can do better by improving this by making it a little more generic that will work with any arithmetic type ! 这个函数很好,所有人,我们可以通过改进它的通用性使其可以与任何arithmetic type一起工作来做得更好!

template<typename T>
T array_average( std::vector<T>& values ) {
    if( std::is_arithmetic<T>::value ) {
        if( !values.empty() ) {
            if( values.size() == 1 ) {
                return values[0];
            } else { 
                return std::accumulate( values.begin(), values.end(), static_cast<T>( 0 ) ) / values.size();
            }
        } else {
            throw std::runtime_error( "Can not take average of an empty container" ); 
        }
    } else {
        throw std::runtime_error( "T is not of an arithmetic type" );
    }
}

At first glance this looks okay. 乍一看,这看起来还不错。 This will compile and run if you use this with types that are arithmetic . 如果将其与arithmetic类型一起使用,它将编译并运行。 However, if we use it with a type that isn't this will fail to compile. 但是,如果我们将其与非类型一起使用,则将无法编译。 For example: 例如:

#include <vector>
#include <numeric>
#include <iostream>
#include <exception>
#include <type_traits>

class Fruit {
protected:
     std::string name_;
public:
    std::string operator()() const {
        return name_;
    }
    std::string name() const { return name_; }

    Fruit operator+( const Fruit& other ) {
        this->name_ += " " + other.name();
        return *this;
    }
};

class Apple : public Fruit {
public:
    Apple() { this->name_ = "Apple"; }

};

class Banana : public Fruit {
public:
    Banana() { this->name_ = "Banana"; }
};

class Pear : public Fruit {
public:
    Pear() { this->name_ = "Pear"; }
};

std::ostream& operator<<( std::ostream& os, const Fruit& fruit ) {
    os << fruit.name() << " ";
    return os;
}

template<typename T>
T array_average( std::vector<T>& values ); // Using the definition above

int main() {
    try {
        std::vector<uint64_t> values { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
        std::vector<double> values2 { 2.0, 3.5, 4.5, 6.7, 8.9 };
        std::vector<Fruit> fruits { Apple(), Banana(), Pear() };

        std::cout << array_average( values ) << '\n';  // compiles runs and prints 6
        std::cout << array_average( values2 ) << '\n'; // compiles runs and prints 5.12
        std::cout << array_average( fruits ) << '\n'; // fails to compile.

    } catch( const std::runtime_error& e ) {
        std::cerr << e.what() << '\n';
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

This fails to compile because the static_cast can not convert int to T with T = Fruit MSVC compiler error C2440 这不能编译,因为static_cast不能转换intTT = Fruit MSVC编译器错误C2440

We can fix this by changing a single line of code in our function template if your compiler supports it: 如果您的编译器支持,我们可以通过在函数模板中更改一行代码来解决此问题:

We can change if( std::is_arithmetic<T>::value ) to if constexpr( std::is_arithmetic<T>::value ) and our function will now look like this: 我们可以将if( std::is_arithmetic<T>::value )更改为if constexpr( std::is_arithmetic<T>::value ) ,我们的函数现在看起来像这样:

template<typename T>
T array_average( const std::vector<T>& values ) {
    if constexpr( std::is_arithmetic<T>::value ) {
        if( !values.empty() ) {
            if( values.size() == 1 ) {
                return values[0];
            } else {
                return std::accumulate( values.begin(), values.end(), static_cast<T>( 0 ) ) / values.size();
            }
        } else {
            throw std::runtime_error( "Can not take average of an empty container" );
        }
    } else {
        throw std::runtime_error( "T is not of an arithmetic type" );
    }
}

You can run the same program above and it will fully compile even when you are using types that are not arithmetic. 您可以在上面运行相同的程序,即使您使用的是非算术类型,也可以完全编译。

int main() {
    //....
    std::cout << array_average( fruits ) << '\n'; // Now compiles
    //...
}

However when you run this code it will generate a Runtime Error and depending on how your IDE and debugger is setup you may need to put a break point within the catch statement where the return EXIT_FAILURE is to see the message printed to the screen, otherwise the application may just exit without any notification at all. 但是,当您运行此代码时,它将生成一个运行时错误,并且取决于您的IDE和调试器的设置方式,您可能需要在catch语句中放置一个断点,在该语句中, return EXIT_FAILURE将看到打印在屏幕上的消息,否则应用程序可能会退出而根本没有任何通知。

If you don't want runtime errors you can substitute and produce compiler time errors by using static_assert instead of throwing a runtime error. 如果您不希望出现运行时错误,则可以使用static_assert而不是引发运行时错误来替代并产生编译器时间错误。 This can be a handy little function, but it isn't 100% without some minor limitations and gotchas, but to find out more information about this function you can check the Question that I had asked when I was writing the implementation to this function that can be found here and you can read the comments there that will give you more insight to some of the limitations that this function provides. 这可能是一个方便的小功能,但并非100%没有一些小的限制和陷阱,但是要查找有关此功能的更多信息,可以检查我在编写对此功能的实现时所问的问题:可以在此处找到,您也可以在此处阅读注释,以使您更深入地了解此功能提供的某些限制。

One of the current limitations with this function would be this: let's say we have a container that has a bunch of complex numbers (3i + 2) , (4i - 6) , (7i + 3) well you can still take the average of these as it is a valid thing, but the above function will not consider this to be arithmetic in it's current state. 此函数当前的局限性之一是:假设我们有一个容器,其中包含一堆复数(3i + 2)(4i - 6)(7i + 3) ,但仍然可以取这些都是有根据的,但是上面的函数在当前状态下不会将其视为算术运算。

To resolve this issue what can be done is this: instead of using std::is_arithmetic<t> you could write your own policy and traits that this function should accept. 要解决此问题,可以执行以下操作:您可以编写此函数应接受的自己的policytraits ,而不是使用std::is_arithmetic<t> I'll leave that part as an exercise for you. 我将把这一部分留给您练习。

As you can see a majority of the work is already being done for us with the standard library. 如您所见,标准库已经为我们完成了大部分工作。 We used accumulate and divided by the containers size and we were done, the rest of the time involved was making sure it accepts proper types, if it's to be thread safe and or exception safe etc. 我们使用了accumulate并除以容器大小,然后完成了所有工作,其余时间是确保它接受正确的类型(如果要保证线程安全和/或异常安全等)。

Finally we did not have to worry about cumbersome for loops on arrays and making sure the loops didn't exceed the size of the array. 最后,我们不必担心数组上繁琐的循环,并确保循环不会超出数组的大小。 We did not have to call new and worry about when and where to call delete in order to not have any memory leaks. 我们不必调用new ,也不必担心何时何地调用delete来避免任何内存泄漏。 ASFAIK I do not think that std::accumulate will overflow on supporting containers, but don't quote me on this one. ASFAIK我认为std::accumulate不会在支持容器上溢出,但不要在此引用我。 It may depend on the types that are in the container and that there is a static_cast involved. 它可能取决于容器中的types以及是否涉及static_cast Even with some of these caveats in many cases it is still better to use containers than managing your own raw memory, and to use the algorithms and functions that are designed to work on them. 即使在许多情况下有一些警告,使用容器还是要比管理自己的原始内存以及使用专门用于处理它们的算法和功能更好。 They make things a lot simpler and easier to manage and even debug. 它们使事情变得更简单,更易于管理,甚至调试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM