简体   繁体   中英

c++, classes, vectors, optimization: multiple independent vectors vs 1 vector of classes

Say I have multiple vectors of various datatypes:

vector <double> someNumbers;
vector <int> someMoreNumbers;
vector <string> someStrings;

int main(){
    for(...){
        someNumbers[i];
        someMoreNumbers[i];
        someStrings[i];
    }
}

Would it be more, less or equally efficient if I were to put all of this data into a class and instead use 1 vector of classes to access them?

class vectors{
    double aNumber;
    int anotherNumber;
    string aString;
}

int main(){
    for(...){
        vectors[i].aNumber;
        vectors[i].anotherNumber;
        vectors[i].aString;
    }
}

Is there some sort of extra overhead that comes with accessing the same data from within a class? Does the overall efficiency depend on the size of my vectors (in my case each vector contains 15,000 items)?

IMHO, the second version would be more efficient because you make better use of the cache due to the fact that your data are stored contiguously, while in the first version your data are fragmented in three different vectors.

However, in any case you would have to benchmark the two versions to find out the most efficient one.

Does the overall efficiency depend on the size of my vectors (in my case each vector contains 15,000 items)?

Yes, the efficiency differences between the two approaches definitely depend on total size. Most of the performance difference will be due to cache misses. When working with a much larger amount of data than you described, it is common for most of the performance overall to be dominated by cache misses, so getting that detail right would really matter.

But 15,000 is small. So your L2 cache misses (normally the very important ones) aren't that important. For some random patterns in the sequence of index ( i in your quoted code) combined with use of all three items for each i , the vector of structs would have fewer L1 cache misses, translating to measurably better performance.

But more likely, you would have an access pattern in which the cache pollution from the alignment waste would cause more cache misses than the association of related elements would save. So at a size like 15,000 I would predict the separate vectors would be trivially faster.

But the real bottom line is that 15,000 is small, so the logical association of elements in a struct has more benefit in code readabilty than the trivial possible loss in performance.

Would it be more, less or equally efficient if I were to put all of this data into a class and instead use 1 vector of classes?

The memory required will be most likely be larger if you use a vector of struct s since sizeof(int) + sizeof(double) + sizeof(std::string) will be strictly less than sizeof(vectors) .

However, there are other factors that must be taken into account when choosing one method vs the others. I can think of two such factors: code readability and maintainability, run time performance. The code will be easier to read and maintain if you put the data into a struct/class . It's difficult to say the run time differences between the two approaches. My suspicion is they won't be too much different.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM