简体   繁体   中英

std::mutiset vs std::vector to read and write sorted strings to a file

I've a file say somefile.txt it contains names (single word) in sorted order.

I want to updated this file, after adding new name, in sorted order.

Which of the following will be most preferred way and why ?

Using a std::multiset

std::multiset<std::string> s;

std::copy(std::istream_iterator<std::string>(fin),//fin- object of std::fstream
          std::istream_iterator<std::string>(), 
          std::inserter(s, s.begin())); 

s.insert("new_name");

//Write s to the file

OR

Using a std::vector

std::vector<std::string> v;

std::copy(std::istream_iterator<std::string>(fin),
              std::istream_iterator<std::string>(), 
              std::back_inserter(v));

v.push_back("new_name");

std::sort(v.begin(),v.end());

//Write v to the file.

The multiset is slower to insert objects than the vector, but they are held sorted. The multiset is likely to take up more memory than the vector as it has to hold pointers to an internal tree structure. This may not always be the case as the vector may have some empty space.

I guess if you need the information to grow incrementally but always to be ready for immediate access in order then the multi set wins.

If you collect the data all at once without needing to access it in order, it is probably simpler to push it onto the vector and then sort. So how dynamic is the data to be stored is the real criterion.

Both options are basically equivalent.

In a performance-critical scenario, the vector approach will be faster, but your perf is largely going to be constrained by the disk in this case; which container you choose won't matter much.

std::string new_name = "new_name";
bool inserted = false;
std::string current;
while (std::cin >> current) {
    if (!inserted && new_name < current) {
        std::cout << new_name << '\n';
        inserted = true;
    }
    std::cout << current << '\n';
}

Vectors are faster from what I could see from this guy's testing ( http://fallabs.com/blog/promenade.cgi?id=34 ). I would suggest that you test it out and see for yourself. Performance is often related to platform and especially, in this case, datasets.

From his testing, he concluded that simple element works best with vector. For complex element (more than 4 strings for instance), multiset is faster.

Also, since vectors are big arrays, if you're adding lots of data, it may be worth looking into using another type of container (linked list for instance or a specialized boost container see Is there a sorted_vector class, which supports insert() etc.? ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM