简体   繁体   中英

Reliably using C++ Small String Optimization to fread short std::strings from Files into Memory

I have the following class, it contains a data structure called Index, which is expensive to compute. So I am caching the index to disk and reading it in again. The index element id of template type T can be used with a variety of primitive datatypes.

But I would also like to use id with the type std::string. I wrote the serialize/deserilize code for the general case and also tested if it works with normal C++ strings and they work, if they are short enough. Small string optimization seems to kick in.

I also wrote a different implementation just for handling longer strings safely. But the safe code is about 10x slower and I would really like to just read in the strings with fread (500ms readin are very painful, while 50ms are perfectly fine).

How can I reliably use my libcpp small string optimization, if I know that all identifiers are shorter than the longest possible short string? How can I reliably tell how long the longest possible small string is?

template<typename T>
class Reader {
public:
    struct Index {
        T id;
        size_t length;
        // ... values etc
    };

    Index* index;
    size_t indexTableSize;

    void serialize(const char* fileName) {
        FILE *file = fopen(fileName, "w+b");
        if (file == NULL)
            return;

        fwrite(&indexTableSize, sizeof(size_t), 1, file);
        fwrite(index, sizeof(Index), indexTableSize, file);

        fclose(file);
    }

    void deserialize(const char* fileName) {
        FILE *file = fopen(fileName, "rb");
        if (file == NULL)
            return;

        fread(&indexTableSize, sizeof(size_t), 1, file);
        index = new Index[indexTableSize];
        fread(index, sizeof(Index), indexTableSize, file);

        fclose(file);
    }


};

// works perfectly fine
template class Reader<int32_t>;

// works perfectly fine for strings shorter than 22 bytes
template class Reader<std::string>;

std::string is not trivially copyable . And performing memcpy on a type (which is the equivalent of fwrite ing it and fread ing it back) in C++ is only legal if it is trivially copyable. Therefore, what you want to do is not possible directly.

If you want to serialize a string, you must do so manually. You must get the number of characters and write it, then write those characters themselves. To read it back in, you have to read the size of the string, then read that many characters.

If you want to reliably serialize/deserialize with a type T, you have to make sure that your type T is a POD type (or more precisely standard layout and trivial ).

You can check this in your template by using std::is_trivially_copyable<T> and std::is_standard_layout<T> . Unfortunately, this will fail for std::string .

If it's not the case, you must find a proper way to serialize/deserialize the class, ie write/read the data that permit to reconstruct the state of the object (here, the length of the string, and its content).

Three options:

  • use an auxiliary template that converts T from/to an array of bytes and write a specialisation of this template for each type that may be used for your Reader.
  • use a member function that does this. But this is not possible for std types.
  • use a serialization library, such as for example boost::serialize , s11n or others

I would in any case strongly advise you not to rely on non portable properties , such as the length of short strings, especially if you have this code in a template supposed to work with generic types.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM