Map, pair-vector or two vectors…?

Question

I read through some posts and "wikis" but still cannot decide what approach is suitable for my problem.

I create a class called Sample which contains a certain number of compounds (lets say this is another class Nuclide ) at a certain relative quantity (double).

Thus, something like (pseudo):

class Sample {
    map<Nuclide, double>;
}

If I had the nuclides Ba-133 , Co-60 and Cs-137 in the sample, I would have to use exactly those names in code to access those nuclides in the map. However, the only thing I need to do, is to iterate through the map to perform calculations (which nuclides they are is of no interest), thus, I will use a for- loop. I want to iterate without paying any attention to the key-names, thus, I would need to use an iterator for the map, am I right?

An alternative would be a vector<pair<Nuclide, double> >

class Sample {
    vector<pair<Nuclide, double> >;
}

or simply two independent vectors

Class Sample {
    vector<Nuclide>;
    vector<double>;
}

while in the last option the link between a nuclide and its quantity would be "meta-information", given by the position in the respective vector only.

Due to my lack of profound experience, I'd ask kindly for suggestions of what approach to choose. I want to have the iteration through all available compounds to be fast and easy and at the same time keep the logical structure of the corresponding keys and values.

PS.: It's possible that the number of compunds in a sample is very low (1 to 5)! PPS.: Could the last option be modified by some const statements to prevent changes and thus keep the correct order?

Answer 1

If iteration needs to be fast, you don't want std::map<...> : its iteration is a tree-walk which quickly gets bad. std::map<...> is really only reasonable if you have many mutations to the sequence and you need the sequence ordered by the key. If you have mutations but you don't care about the order std::unordered_map<...> is generally a better alternative. Both kinds of maps assume you are looking things up by key, though. From your description I don't really see that to be the case.

std::vector<...> is fast to iterated. It isn't ideal for look-ups, though. If you keep it ordered you can use std::lower_bound() to do a std::map<...> -like look-up (ie, the complexity is also O(log n) ) but the effort of keeping it sorted may make that option too expensive. However, it is an ideal container for keeping a bunch objects together which are iterated.

Whether you want one std::vector<std::pair<...>> or rather two std::vector<...> s depends on your what how the elements are accessed: if both parts of an element are bound to be accessed together, you want a std::vector<std::pair<...>> as that keeps data which is accessed together. On the other hand, if you normally only access one of the two components, using two separate std::vector<...> s will make the iteration faster as more iteration elements fit into a cache-line, especially if they are reasonably small like double s.

In any case, I'd recommend to not expose the external structure to the outside world and rather provide an interface which lets you change the underlying representation later. That is, to achieve maximum flexibility you don't want to bake the representation into all your code. For example, if you use accessor function objects ( property maps in terms of BGL or projections in terms of Eric Niebler's Range Proposal) to access the elements based on an iterator, rather than accessing the elements you can change the internal layout without having to touch any of the algorithms (you'll need to recompile the code, though):

// version using std::vector<std::pair<Nuclide, double> >
// - it would just use std::vector<std::pair<Nuclide, double>::iterator as iterator
auto nuclide_projection = [](Sample::key& key) -> Nuclide& {
    return key.first;
}
auto value_projecton = [](Sample::key& key) -> double {
    return key.second;
}

// version using two std::vectors:
// - it would use an iterator interface to an integer, yielding a std::size_t for *it
struct nuclide_projector {
    std::vector<Nuclide>& nuclides;
    auto operator()(std::size_t index) -> Nuclide& { return nuclides[index]; }
};
constexpr nuclide_projector nuclide_projection;
struct value_projector {
    std::vector<double>& values;
    auto operator()(std::size_t index) -> double& { return values[index]; }
};
constexpr value_projector value_projection;

With one pair these in-place, for example an algorithm simply running over them and printing them could look like this:

template <typename Iterator>
void print(std::ostream& out, Iterator begin, Iterator end) {
    for (; begin != end; ++begin) {
         out << "nuclide=" << nuclide_projection(*begin) << ' '
             << "value=" << value_projection(*begin) << '\n';
    }
}

Both representations are entirely different but the algorithm accessing them is entirely independent. This way it is also easy to try different representations: only the representation and the glue to the algorithms accessing it need to be changed.

Map, pair-vector or two vectors…?

Question

1 answers

solution1
1 ACCPTED 2014-12-31 11:26:20

Map, pair-vector or two vectors…?

Question

1 answers

solution1 1 ACCPTED 2014-12-31 11:26:20

solution1
1 ACCPTED 2014-12-31 11:26:20