简体   繁体   中英

Avoiding key construction for std::map::find()

Suppose I've got a std::map<std::string, int> . std::string can be compared to C strings (const char*) without std::string temporaries. However, map::find() appears to force me to construct a temporary std::string , which probably requires a memory allocation. How do I avoid this? Conceptually it's easy, but the STL appears to prevent this.

#include <map>

int main()
{
    std::map<std::string, int> m;
    m.find("Olaf");
}

Your concern is real, and there's no good workaround for C++11.

C++14 fixes this issue by adding a templated overload of std::map::find — the relevant proposal is N3657 . In C++14, your program would look like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <map>
#include <algorithm>

class std_string {
    char *m_s;
public:
    std_string() { m_s = nullptr; }
    std_string(const char* s) { puts("Oops! A new std_string was constructed!"); m_s = strdup(s); }
    ~std_string() { free(m_s); }
    std_string(std_string&& ss) = delete;
    std_string(const std_string& ss) = delete;
    std_string& operator=(std_string&& ss) = delete;
    std_string& operator=(const std_string& ss) = delete;

    bool operator< (const char* s) const { return strcmp(m_s, s) < 0; }
    bool operator< (const std_string& ss) const { return strcmp(m_s, ss.m_s) < 0; }
    friend bool operator< (const char* s, const std_string& ss) { return strcmp(s, ss.m_s) < 0; }
};

int main()
{
    {
        puts("The C++11 way makes a copy...");
        std::map<std_string, int> m;
        auto it = m.find("Olaf");
    }
    {
        puts("The C++14 way doesn't...");
        std::map<std_string, int, std::less<>> m;
        auto it = m.find("Olaf");
    }
}

( std::less<> is the generalized "less-than" comparator, equivalent to operator< . C++03 and C++11 have a broken-by-design version of this comparator that forces both arguments to be the same type. C++14 finally does it right.)

Unfortunately the Committee seems to have decided that people should go through all their C++11 code and update every container to use std::less<> as the comparator — it doesn't just happen by default. There's no good reason for this decision; it's just The Way It Is. (See my comments above about broken-by-design. C++ has a bad habit of introducing broken versions of things before introducing the "real" versions several years later.)

For C++11, std::map::find has only one overload (the one that takes const Key& ), so any workaround will necessarily involve changing the Key type to be less expensive — we can't just fiddle with the comparator, because by the time execution reaches the comparator, we've already promoted find 's argument to the Key type.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <map>
#include <algorithm>

class std_string {
    char *m_s;
public:
    std_string() : m_s(nullptr) { }
    std_string(const char* s) : m_s(strdup(s)) { puts("Oops! A new std_string was constructed!"); }
    ~std_string() { free(m_s); }
    std_string(std_string&& ss) : m_s(nullptr) { std::swap(m_s, ss.m_s); }
    std_string(const std_string& ss) : m_s(strdup(ss.data())) { puts("Oops! A new std_string was constructed!"); }
    std_string& operator=(std_string&& ss) = delete;
    std_string& operator=(const std_string& ss) = delete;

    const char* data() const { return m_s; }

    bool operator< (const char* s) const { return strcmp(m_s, s) < 0; }
    bool operator< (const std_string& ss) const { return strcmp(m_s, ss.m_s) < 0; }
    friend bool operator< (const char* s, const std_string& ss) { return strcmp(s, ss.m_s) < 0; }
};

struct string_or_ptr {
    union {
        const char* ptr;
        alignas(std_string) unsigned char str[sizeof (std_string)];
    } m_u;
    bool m_deep;

    char const* & ptr() { return m_u.ptr; }
    std_string& str() { return *reinterpret_cast<std_string*>(m_u.str); }
    char const* const & ptr() const { return m_u.ptr; }
    std_string const& str() const { return *reinterpret_cast<const std_string*>(m_u.str); }

    string_or_ptr() : m_deep(false) { ptr() = ""; }
    string_or_ptr(const char* s) : m_deep(false) { ptr() = s; }
    string_or_ptr(std_string&& s) : m_deep(true) { new ((void*)&str()) std_string(std::move(s)); }
    string_or_ptr(const std_string& s) : m_deep(true) { new ((void*)&str()) std_string(s); }
    ~string_or_ptr() { if (m_deep) str().~std_string(); }
    std_string& operator=(std_string&& ss) = delete;
    std_string& operator=(const std_string& ss) = delete;


    operator const char*() const { return m_deep ? str().data() : ptr(); }

    bool operator< (const char* s) const { return strcmp((const char*)*this, s) < 0; }
    bool operator< (const std_string& ss) const { return (const char*)*this < ss; }
    bool operator< (const string_or_ptr& sp) const { return strcmp((const char*)*this, (const char*)sp) < 0; }
    friend bool operator< (const char* s, const string_or_ptr& sp) { return strcmp(s, (const char*)sp) < 0; }
    friend bool operator< (const std_string& ss, const string_or_ptr& sp) { return ss < (const char*)sp; }
};

int main()
{
    {
        puts("The C++11 way...");
        std::map<std_string, int> m;
        auto it = m.find("Olaf");
    }
    {
        puts("The C++11 way with a custom string-or-pointer Key type...");
        std::map<string_or_ptr, int> m;
        auto it = m.find("Olaf");
    }
}

There isn't actually a way to force find to use a comparison operator different from the one used to create the map . If you could pass a different one into find , how could it guarantee that both comparisons would provide the same ordering?

Instead, just think about the cases:

1) You're passing around char* in your program. In this case, just don't do that. Use std::string instead, creating it one time if needed, as close to the origination as possible. Then no conversion is needed.

2) You're trying to find string literals. In this case, why is the key a string ? Make the key be a nicely named enumerated type instead:

enum names { OLAF };
map<names, int> m;
m.find(OLAF);

3) You want to find both strings and C-string literals. In this case, I would create a global lookup table of strings indexed by an enumeration but built once at the start of main. Then you would call something like m.find(global_strings[OLAF]);

EDIT: You seem very focused/concerned about the performance implications of string here. Have you profiled your application and found that string 's allocations are a significant portion of your app's time? I would of course believe this on embedded systems/devices.

Additionally, you've tagged your question C++ but you seem to be outright refusing to use C++'s built in string feature which does far more than "cost you performance". It provides a variety of helpful functions/methods/operators but most importantly it manages the memory for you so you don't spend days or weeks hunting down those truly insidious bugs that no doubt will crop up.

If you're reading variable length data from the network I can't quite grasp the performance difference between char* buffer = new char[needed_size]; and something like std::string s; s.resize(needed_size); std::string s; s.resize(needed_size); other than that using string provides some safety and memory management for you.

If the construction of a string from a literal is really a measured performance bottleneck for you, you can use your own class instead of a std::string that holds either a string or a pointer to a literal. The downside is some extra complication, plus adding the size of a pointer to the elements you're inserting into the container. Note that the value is immutable as required by map , so it's safe to store the results of c_str .

class mystring
{
    std::string  str;
    const char * value;
public:
    mystring() : value(NULL)
    {
    }
    void setString(const std::string & s)
    {
        assert(value == NULL);
        str = s;
        value = str.c_str();
    }
    void setLiteral(const char * s)
    {
        assert(value == NULL);
        value = s;
    }
    bool operator<(const mystring & rhs)
    {
        return strcmp(literal, rhs.literal) < 0;
    }
};

std::map<mystring, int> m;
mystring text;
text.setString(some_key);
m.insert(std::make_pair(text, some_data));
// ...
mystring key;
key.setLiteral("Olaf");
m[key] = new_value;

There is no way to specifically define a comparator for map::find() function. Instead I would advice you to use create your comparator (myOwnCmp) and delcare a std::map<char*, int, myOwnCmp> for your program. For very large programs or when the number of testcases are very high, it will be comparatively faster than std::map<string, int> because the time spend to create a string by calling the string constructor and later calling its destructor consumes significant amount of time. Using const char* as the key will only involve pointer comparison.

The only thing you need to take care of is to make a separate local copy of the char* coming as an argument when you are populating the map by overriding the insert function or by creating your own add function, because there are chances that the pointer can later be modified or deleted. So you want to make sure that you have kept a local copy of the char* before adding it as the key to the map.

The code would go something like this:-

 struct myOwnComp {
        bool operator()(const char* a, const char* b) const {
            return (strcmp(a, b) < 0);
        }
    };

std::map<char*, int, myOwnComp> mymap;

void addToMap(char*& ref, int value)
{
    char *key = new char[strlen(ref) + 1]{};
    std::copy(ref, ref + strlen(ref), key);
    mymap[key] = value;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM