In C++, regarding bit-shifting and casting data types

Question

I recently asked a question here on Stack Overflow about how to cast my data from a 16-bit integer followed by an undetermined amount of void*-cast memory into an std::vector of unsigned chars for the sake of using a socket library known as NetLink which uses a function whose signature looks like this to send raw data:

void rawSend(const vector<unsigned char>* data);

(for reference, here's that question: Casting an unsigned int + a string to an unsigned char vector )

The question was successfully answered and I'm grateful to those who responded. Mike DeSimone responded with an example of a send_message() function that converts the data into a format that NetLink accepts (an std::vector), which looks like this:

void send_message(NLSocket* socket, uint16_t opcode, const void* rawData, size_t rawDataSize)
{
    vector<unsigned char> buffer;
    buffer.reserve(sizeof(uint16_t) + rawDataSize);
    buffer.push_back(opcode >> 8);
    buffer.push_back(opcode & 0xFF);
    const unsigned char* base(reinterpret_cast<const unsigned char*>(rawData));
    buffer.insert(buffer.end(), base, base + rawDataSize);
    socket->rawSend(&buffer);
}

This looks to be exactly what I needed, and so I set off to write an accompanying receive_message() function...

...but I'm embarrassed to say that I don't entirely understand all the bit-shifting and whatnot, so I've run into a wall here. In all the code I've ever written in the past nearly decade, most of my code has been in higher-level languages, and the rest of my code hasn't ever really called for lower-level memory operations.

Back on the subject of writing a receive_message() function, my starting point, as you might imagine, is NetLink's rawRead() function, whose signature looks like this:

vector<unsigned char>* rawRead(unsigned bufferSize = DEFAULT_BUFFER_SIZE, string* hostFrom = NULL);

It looks like my code will start off something like this:

void receive_message(NLSocket* socket, uint16_t* opcode, const void** rawData)
{
    std::vector<unsigned char, std::allocator<unsigned char>>* buffer = socket->rawRead();
    std::allocator<unsigned char> allocator = buffer->get_allocator(); // do I even need this allocator?  I saw that one is returned as part of the above object, but...
    // ...
}

After that first call to rawRead(), it appears I would need to iterate through the vector, retrieving data from it and reversing bitshifting operations, and then return the data into *rawData and *opcode. Again, I'm not very familiar with bitshifting (I did some googling to understand the syntax, but I don't understand why the above send_message() code requires shifting at all), so I'm at a loss for my next step here.

Can someone help me to understand how to write this accompanying receive_message() function? As a bonus, if someone could help explain the original code so that I know for the future how it works (particularly, how the shifting works in this case and why it's necessary), that would serve to deepen my understanding much for the future.

Thanks in advance!

Answer 1

The library's function signature …

    void rawSend( const vector<unsigned char>* data );

forces you to build a std::vector of your data, which in essence means that it imposes a needless inefficiency. There is no advantage in requiring client code to build a std::vector . Whoever designed that do not know what they're doing, and it would be wise to not use their software.

The library function signature …

    vector<unsigned char>* rawRead(unsigned bufferSize = DEFAULT_BUFFER_SIZE, string* hostFrom = NULL);

is worse: it not just needlessly requires you to build a std::string if you want to specify a “hostFrom” (whatever that really means), but it needlessly requires you to deallocate the result vector . At least if there is any sense to function result type. Which, of course, there might not be.

You should not be using a library with so disgusting function signatures. Probably any randomly picked library will be much better. Ie, much easier to use.

How the existing usage code …

void send_message(NLSocket* socket, uint16_t opcode, const void* rawData, size_t rawDataSize)
{
    vector<unsigned char> buffer;
    buffer.reserve(sizeof(uint16_t) + rawDataSize);
    buffer.push_back(opcode >> 8);
    buffer.push_back(opcode & 0xFF);
    const unsigned char* base(reinterpret_cast<const unsigned char*>(rawData));
    buffer.insert(buffer.end(), base, base + rawDataSize);
    socket->rawSend(&buffer);
}

works:

The reserve call is a case of premature optimization. It tries to make the vector do just one single buffer allocation (performed at this point) instead of possibly two or more. A much better cure for the manifest inefficiency of building a vector , is to use a more sane library.
The buffer.push_back(opcode >> 8) places the high 8 bits of (assumed) 16-bit quantity opcode , at the start of the vector. Placing the high part, the most significant part, first, is known as big endian format. Your reading code at the other end must assume big endian format. And likewise, if this sending code had use little endian format, then the reading code would have to assume little endian format. So, this is just a data format decision, but given the decision the code at both ends must adhere to it.
The buffer.push_back(opcode & 0xFF) calls places the low 8 bits of opcode after the high bits, as is correct for big endian.
The const unsigned char* base(reinterpret_cast<const unsigned char*>(rawData)) declaration just names a suitably typed pointer to your data, calling it base . The type const unsigned char* is suitable because it allows byte level address arithmetic . The original formal argument type const void* does not admit address arithmetic.
The buffer.insert(buffer.end(), base, base + rawDataSize) adds the data to the vector. The expression base + rawDataSize is the address arithmetic that the previous declaration enabled.
socket->rawSend(&buffer) is the final call down to the SillyLibrary's rawSend method.

How to wrap a call to the SillyLibrary rawRead function.

First, define a name for the byte datatype (always a good idea to name things):

typedef unsigned char Byte;
typedef ptrdiff_t Size;

Consult the documentation about how to deallocate/destroy/delete (if necessary) the SillyLibrary function result:

void deleteSillyLibVector( vector<Byte> const* p )
{
    // perhaps just "delete p", but it depends on the SillyLibrary
}

Now, for the send operation having std::vector involved was just a pain. For the receive operation, it's opposite. Creating a dynamic array and passing it safely and efficiently as a function result, is just the kind of thing that std::vector was designed for.

However, the send operation was just a single call.

For the receive operation it is possible , depending on the design of SillyLibrary, the you need to loop, to perform of number of receive calls until you have received all the data. You do not provide enough information to do this. But the code below shows a bottom layer read that your looping code can call, accumulating data in a vector :

Size receive_append( NLSocket& socket, vector<Byte>& data )
{
    vector<Byte> const* const result = socket.raw_read();

    if( result == 0 )
    {
        return 0;
    }

    struct ScopeGuard
    {
        vector<Byte>* pDoomed;
        explicit ScopeGuard( vector<Byte>* p ): pDoomed( p ) {}
        ~ScopeGuard() { deleteSillyLibVector( pDoomed ); }
    };

    Size const nBytesRead = result->size();
    ScopeGuard cleanup( result );

    data.insert( data.end(), result->begin(), result->end() );
    return nBytesRead;
}

Note the use of a destructor to do cleanup, which makes this more exception safe. In this particular case about the only possible exception is a std::bad_alloc , which is pretty fatal anyway. But the general technique of using a destructor to do cleanup, for exception safety, is very much worth knowing about and using as a matter of course (usually one does not have to define any new class, though, but when dealing with a SillyLibrary one may have to do that).

Finally, when your looping code has determined that all data is at hand, it can interpret the data in your vector . I leave that as an exercise, even though that is mainly what you asked for. And that is because I have already written almost like a whole article here.

Disclaimer: off-the-cuff code.

Cheers & hth.,

Answer 2

To put the bit-fiddling into non-bit-fiddling terms, opcode >> 8 is equivalent to opcode / 256 and opcode & 0xFF is equivalent to opcode - ((opcode / 256) * 256) . Watch out for the rounding/truncation.

Think of opcode as being composed of two chunks, ophi and oplo , each with values 0..255. opcode == (ophi * 256) + oplo .

Some extra clues...

0xFF  == 255 == binary  11111111 == 2^8 - 1
0x100 == 256 == binary 100000000 == 2^8

              opcode
         /              \
Binary : 1010101010101010
         \      /\      /
           ophi    oplo

The reason for this is basically an endian-fix for writing a sixteen-bit value to a byte-wise data stream. The network stream has it's own rule in which the "big end" of the value must be sent first, independent of how that is handled by default on any particular platform. That send_message is basically deconstructing the sixteen-bit value to send it. You'll need to read the two chunks in, then reconstruct the sixteen bit value.

Whether you code the reconstruction as opcode = (ophi * 256) + oplo; or as opcode == (ophi << 8) | oplo; opcode == (ophi << 8) | oplo; is mostly a matter of taste - the optimiser will understand the equivalence and figure out what's most efficient anyway.

Also, no, I don't think you don't need an allocator. I'm not even sure that using vector is a good idea, given that you're using a const void** rawData parameter, but probably it is and you should do a reserve before reading into it. Then extra the relevant chunks (the two bytes to reconstruct the opcode, plus the array content).

The big problem I see - how do you know the size of the raw data that you'll be reading? It doesn't appear to either be a parameter to receive_message , nor provided by the data stream itself.

In C++, regarding bit-shifting and casting data types

Question

2 answers

solution1
3 ACCPTED 2011-10-15 04:59:04

solution2
0 2011-10-15 04:28:19

In C++, regarding bit-shifting and casting data types

Question

2 answers

solution1 3 ACCPTED 2011-10-15 04:59:04

solution2 0 2011-10-15 04:28:19

solution1
3 ACCPTED 2011-10-15 04:59:04

solution2
0 2011-10-15 04:28:19