简体   繁体   中英

A dynamic buffer type in C++?

I'm not exactly a C++ newbie, but I have had little serious dealings with it in the past, so my knowledge of its facilities is rather sketchy.

I'm writing a quick proof-of-concept program in C++ and I need a dynamically sizeable buffer of binary data. That is, I'm going to receive data from a network socket and I don't know how much there will be (although not more than a few MB). I could write such a buffer myself, but why bother if the standard library probably has something already? I'm using VS2008, so some Microsoft-specific extension is just fine by me. I only need four operations:

  • Create the buffer
  • Write data to the buffer (binary junk, not zero-terminated)
  • Get the written data as a char array (together with its length)
  • Free the buffer

What is the name of the class/function set/whatever that I need?

Added: Several votes go to std::vector . All nice and fine, but I don't want to push several MB of data byte-by-byte. The socket will give data to me in few-KB large chunks, so I'd like to write them all at once. Also, at the end I will need to get the data as a simple char*, because I will need to pass the whole blob along to some Win32 API functions unmodified.

You want a std::vector :

std::vector<char> myData;

vector will automatically allocate and deallocate its memory for you. Use push_back to add new data ( vector will resize for you if required), and the indexing operator [] to retrieve data.

If at any point you can guess how much memory you'll need, I suggest calling reserve so that subsequent push_back 's won't have to reallocate as much.

If you want to read in a chunk of memory and append it to your buffer, easiest would probably be something like:

std::vector<char> myData;
for (;;) {
    const int BufferSize = 1024;
    char rawBuffer[BufferSize];

    const unsigned bytesRead = get_network_data(rawBuffer, sizeof(rawBuffer));
    if (bytesRead <= 0) {
        break;
    }

    myData.insert(myData.end(), rawBuffer, rawBuffer + bytesRead);
}

myData now has all the read data, reading chunk by chunk. However, we're copying twice.

We instead try something like this:

std::vector<char> myData;
for (;;) {
    const int BufferSize = 1024;

    const size_t oldSize = myData.size();
    myData.resize(myData.size() + BufferSize);        

    const unsigned bytesRead = get_network_data(&myData[oldSize], BufferSize);
    myData.resize(oldSize + bytesRead);

    if (bytesRead == 0) {
        break;
    }
}

Which reads directly into the buffer, at the cost of occasionally over-allocating.

This can be made smarter by eg doubling the vector size for each resize to amortize resizes, as the first solution does implicitly. And of course, you can reserve() a much larger buffer up front if you have a priori knowledge of the probable size of the final buffer, to minimize resizes.

Both are left as an exercise for the reader. :)

Finally, if you need to treat your data as a raw-array:

some_c_function(myData.data(), myData.size());

std::vector is guaranteed to be contiguous.

std::vector<unsigned char> buffer;

Every push_back will add new char at the end (reallocating if needed). You can call reserve to minimize the number of allocations if you roughly know how much data you expect.

buffer.reserve(1000000);

If you have something like this:

unsigned char buffer[1000];
std::vector<unsigned char> vec(buffer, buffer + 1000);

std::string would work for this:

  • It supports embedded nulls.
  • You can append multi-byte chunks of data to it by calling append() on it with a pointer and a length.
  • You can get its contents as a char array by calling data() on it, and the current length by calling size() or length() on it.
  • Freeing the buffer is handled automatically by the destructor, but you can also call clear() on it to erase its contents without destroying it.

One more vote for std::vector. Minimal code, skips the extra copy GMan's code do:

std::vector<char> buffer;
static const size_t MaxBytesPerRecv = 1024;
size_t bytesRead;
do
{
    const size_t oldSize = buffer.size();

    buffer.resize(oldSize + MaxBytesPerRecv);
    bytesRead = receive(&buffer[oldSize], MaxBytesPerRecv); // pseudo, as is the case with winsock recv() functions, they get a buffer and maximum bytes to write to the buffer

    myData.resize(oldSize + bytesRead); // shrink the vector, this is practically no-op - it only modifies the internal size, no data is moved/freed
} while (bytesRead > 0);

As for calling WinAPI functions - use &buffer[0] (yeah, it's a little bit clumsy, but that's the way it is) to pass to the char* arguments, buffer.size() as length.

And a final note, you can use std::string instead of std::vector, there shouldn't be any difference (except you can write buffer.data() instead of &buffer[0] if you buffer is a string)

I'd take a look at Boost basic_streambuf , which is designed for this kind of purpose. If you can't (or don't want to) use Boost, I'd consider std::basic_streambuf , which is quite similar, but a little more work to use. Either way, you basically derive from that base class and overload underflow() to read data from the socket into the buffer. You'll normally attach an std::istream to the buffer, so other code reads from it about the same way as they would user input from the keyboard (or whatever).

不是来自 STL 但可能有用的替代方案 - Boost.Circular 缓冲区

使用std::vector ,一个不断增长的数组,保证存储是连续的(你的第三点)。

Regarding your comment "I don't see an append()", ineserting at the end is the same thing.

vec.insert(vec.end,

If you do use std::vector, you're just using it to manage the raw memory for you. You could just malloc the biggest buffer you think you'll need, and keep track of the write offset/total bytes read so far (they're the same thing). If you get to the end ... either realloc or choose a way to fail.

I know, it isn't very C++y, but this is a simple problem and the other proposals seem like heavyweight ways to introduce an unnecessary copy.

The point here is, what you want to use the buffer for. If you want to keep structures with pointers the buffer has to be kept fix at the memory address allocated first. To circumvent this, you have to use relative pointers and a fixup list for updating the pointers after the final allocation. This would be worth a class of its own. (Didn't find such a thing).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM