简体   繁体   中英

Using Map vs Vector in a class - speed

Hi I have two versions of a class I've written, one uses a map, one uses two vectors:

    class NucleotideSequence{
private:
    std::string Name;
    std::vector<int> BasePos;
    std::vector<char> BaseChar;
public:
    NucleotideSequence(std::string name, std::vector<int> &bp, std::vector<char> &bases);
    std::string getName();
    char getBase(int pos); // get a base by it's position in the char array.
    char getAbBase(int abPos); // get a base by it's actual bp position.
};


class NucleotideSequence2{
private:
    std::string Name;
    std::map<int, char> Sequence;
public:
    NucleotideSequence2(std::string &name, std::map<int, char> &seq) throw(FormatError);
    std::string getName();
};

I then defined the constructors for them:

NucleotideSequence::NucleotideSequence(std::string name, std::vector<int> &bp, std::vector<char> &bases)
:Name(name), BasePos(bp), BaseChar(bases)
{
    for (std::vector<char>::iterator i = BaseChar.begin(); i != BaseChar.end(); i++) {
        switch (*i) {
            case 'A': case 'T': case 'C': case 'G': case '-': case 'N':
                break;
            case 'a':
                *i = 'A';
                break;
            case 't':
                *i = 'T';
                break;
            case 'c':
                *i = 'C';
                break;
            case 'g':
                *i = 'G';
                break;
            case 'n':
                *i = 'N';
                break;
            default:
                throw FormatError();
                break;
        }
    }
}

NucleotideSequence2::NucleotideSequence2(std::string &name, std::map<int, char> &seq) throw(FormatError)
: Name(name), Sequence(seq)
{
    for (std::map<int, char>::iterator i = Sequence.begin(); i != Sequence.end(); i++) {

        switch (i->second) {
            case 'A': case 'T': case 'C': case 'G': case '-': case 'N':
                break;
            case 'a':
                i->second = 'A';
                break;
            case 't':
                i->second = 'T';
                break;
            case 'c':
                i->second = 'C';
                break;
            case 'g':
                i->second = 'G';
                break;
            case 'n':
                i->second = 'N';
                break;
            default:
                throw FormatError();
                break;
        }
    }
}

These two constructors are called in two different functions:

NucleotideSequence Sequence_stream::get()
{
    if (FileStream.is_open() == false)
        throw StreamClosed(); // Make sure the stream is indeed open else throw an exception.
    if (FileStream.eof())
        throw FileEnd();
    char currentchar;
    int basepos = 0;
    std::string name;
    std::vector<char> sequence;
    std::vector<int> postn;
    currentchar = FileStream.get();
    if (FileStream.eof())
        throw FileEnd();
    if (currentchar != '>')
        throw FormatError();
    currentchar = FileStream.get();
    while(currentchar != '\n' && false == FileStream.eof())
    {
        name.append(1, currentchar);
        currentchar = FileStream.get();
    } // done getting names, now let's get the sequence.
    currentchar = FileStream.get();
    while(currentchar != '>' && false == FileStream.eof())
    {
        if(currentchar != '\n' && currentchar != ' '){
            basepos++;
            sequence.push_back(currentchar);
            postn.push_back(basepos);
        }
        currentchar = FileStream.get();
    }
    if(currentchar == '>')
    {
        FileStream.unget();
    }
    return NucleotideSequence(name, postn, sequence);
}


NucleotideSequence2 Sequence_stream::get2()
{
    if (FileStream.is_open() == false)
        throw StreamClosed(); // Make sure the stream is indeed open else throw an exception.
    if (FileStream.eof())
        throw FileEnd();
    char currentchar;
    int basepos = 0;
    std::string name;
    std::map<int, char> sequence;
    currentchar = FileStream.get();
    if (FileStream.eof())
        throw FileEnd();
    if (currentchar != '>')
        throw FormatError();
    currentchar = FileStream.get();
    while(currentchar != '\n' && false == FileStream.eof())
    {
        name.append(1, currentchar);
        currentchar = FileStream.get();
    } // done getting names, now let's get the sequence.
    currentchar = FileStream.get();
    while(currentchar != '>' && false == FileStream.eof())
    {
        if(currentchar != '\n' && currentchar != ' '){
            basepos++;
            sequence[basepos] = currentchar;
        }
        currentchar = FileStream.get();
    }
    if(currentchar == '>')
    {
        FileStream.unget();
    }
    return NucleotideSequence2(name, sequence);
}

Then these two functions can then be called from another function (which catches the exceptions: in case you are wondering about the uncaught throws).

The difference between the two class is one contains two vectors, whereas in another, the same info is contained in a map.

My question is: the first class and the 'get' that builds it works very quickly - almost instantly. Whereas the 'get2' which builds the second class (the one with the map) - is noticeably slower - just over 5 seconds.

Why is constructing the class with the map slower than constructing the one with the two vectors - you should see I've kept the constructors and the two get functions almost identical except for the addition of elements to vectors, or the addition of key value pairs to the map. So it is my suspicion that repeatedly pushing back to a vector is faster and more efficient than adding key value pairs ie mymap['newkey'] = 'newvalue'; repeatedly.

How can I speed up the map version?

Thanks, Ben.

A vector performs one single allocation (if you tell it the required capacity in advance), or at the very most a small number of allocations. A map performs a separate dynamic allocation for every element .

You may like to experiment using a sorted vector of pairs, or perhaps a "flat map" (in Boost), or a btree-map (there's one in Google Code) and compare performances. Memory locality can make a dramatic difference, and if you don't need the strong iterator validity guarantees of a std::map you may well find a data structure that performs better.

How can I speed up the map version?

Try unordered_map instead of a regular map.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM