简体   繁体   中英

STL Map with a Vector for the Key

I'm working with some binary data that I have stored in arbitrarily long arrays of unsigned ints. I've found that I have some duplication of data, and am looking to ignore duplicates in the short term and remove whatever bug is causing them in the long term.

I'm looking at inserting each dataset into a map before storing it, but only if it was not found in the map to start with. My initial thought was to have a map of strings and use memcpy as a hammer to force the ints into a character array, and then copy that into a string and store the string. This failed because a good deal of my data contains multiple bytes of 0 (aka NULL ) at the front of the relevant data, so a majority of very real data got thrown out.

My next attempt is planned to be std::map<std::vector<unsigned char>,int> , but I'm realizing that I don't know if the map insert function will work.

Is this doable, even if ill advised, or is there a better way to approach this problem?

Edit

So it's been remarked that I didn't make clear what I'm doing, so here's a hopefully better description.

I'm working on generating a minimum spanning tree, given that I have a number of trees containing the actual end nodes I'm working with. The goal is to come up with the selection of trees that has the shortest length and that covers all of the end nodes, where the chosen trees share at most one node with each other and are all connected. I'm basing my approach off of a binary decision tree, but making a few changes to hopefully allow for greater parallelism.

Rather than taking the binary tree approach I've opted to make a bit vector out of unsigned integers for each dataset, where a 1 in a bit position indicates the inclusion of the corresponding tree.

For example if just tree 0 were included in a 5 tree dataset I would start with

00001

From here I can generate:

00011

00101

01001

10001

Each of these can then be processed in parallel, since none of them depend on each other. I do this for all of the single trees (00010, 00100, etc..) and should, I haven't taken the time to formally prove it, be able to generate all values in the range (0,2^n) once and only once.

I started to notice that many datasets were taking far longer to complete than I thought they should, and enabled a debugging output to look at all of the generated results, and a quick Perl script later it was confirmed that I had multiple processes generating the same output. Since then I've been trying to resolve where the duplicates are coming from with very little success, and I'm hoping that this will work well enough to let me verify the results that are being generated without the, sometimes, 3 day wait on computations.

You will not have problems with that, as std::vector provides you the "==", "<" and ">" operators:

http://en.cppreference.com/w/cpp/container/vector/operator_cmp

The requirements for being a key in std::map are satisfied by std::vector , so yes you can do that. Sounds like a good temporary solution (easy to code, minimum of hassle) -- but you know what they say: "there is nothing more permanent than the temporary".

That should work, as Renan Greinert points out, vector<> meets the requirements to be used as a map key.

You also say:

I'm looking at inserting each dataset into a map before storing it, but only if it was not found in the map to start with.

That's usually not what you want to do, as that would involve doing a find() on the map, and if not found, then doing an insert() operation. Those two operations would essentially have to do a find twice. It is better just to try and insert the items into the map. If the key is already there, the operation will fail by definition. So your code would look like this:

#include <vector>
#include <map>
#include <utility>

// typedefs help a lot to shorten the verbose C++ code
typedef std::map<std::vector<unsigned char>, int> MyMapType;

std::vector<unsigned char> v = ...; // initialize this somehow
std::pair<MyMapType::iterator, bool> result = myMap.insert(std::make_pair(v, 42));
if (result.second)
{
   // the insertion worked and result.first points to the newly 
   // inserted pair
}
else
{
   // the insertion failed and result.first points to the pair that
   // was already in the map
}

Why do you need a std::map for that? Maybe I miss some point but what about using an std::vector together with the find algorithm as examplained here ?

This means, that you append your unsigned int s to the vector and later search for it, eg

std::vector<unsigned int> collector; // vector that is substituting your std::map
for(unsigned int i=0; i<myInts.size(); ++i) {  // myInts are the long ints you have
    if(find(collector.begin(), collector.end(), myInts.at(i)==collector.end()) {
         collector.push_back(myInts.at(i));
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM