Safe way to cast a uint16_t to a wchar_t

Question

Trying to clean up some code and i wanted to know if the following is a safe way to cast uint16_t to a wchar_t.

#if ! defined(MARKUP_SIZEOFWCHAR)
#if __SIZEOF_WCHAR_T__ == 4 || __WCHAR_MAX__ > 0x10000
#define MARKUP_SIZEOFWCHAR 4
#else
#define MARKUP_SIZEOFWCHAR 2
#endif

void FileReader::parseBuffer(char * buffer, int length)
{
  //start by looking for a vrsn
  //Header seek around for a vrns followed by 32 bit size descriptor
  //read 32 bits at a time
  int cursor = 0;
  char vrsn[5] = "vrsn";
  cursor = this->searchForMarker(cursor, length, vrsn, buffer);
  int32_t size = this->getObjectSizeForMarker(cursor, length, buffer);
  cursor = cursor + 7; //advance cursor past marker and size
  wchar_t *version = this->getObjectForSizeAndCursor(size, cursor, buffer);
  wcout << version;
  delete[] version; //this pointer is dest from getObjectForSizeAndCursor
}

-

wchar_t* FileReader::getObjectForSizeAndCursor(int32_t size, int cursor, char *buffer) {

  int wlen = size/2;
  uint32_t *dest = new uint32_t[wlen+1];
  unsigned char *ptr = (unsigned char *)(buffer + cursor);
  for(int i=0; i<wlen; i++) {
    #if MARKUP_SIZEOFWCHAR == 4 // sizeof(wchar_t) == 4
      char padding[2] = {'\0','\0'}; 
      dest[i] =  (padding[0] << 24) + (padding[1] << 16) + (ptr[0] << 8) + ptr[1];
    #else // sizeof(wchar_t) == 2
      dest[i] = (ptr[0] << 8) + ptr[1];
    #endif
      ptr += 2;
      cout << ptr;
  }
  return (wchar_t *)dest;
}

do i have any scoping issues with the way i am using the padding? will i leak padding when i delete dest[] in the calling function?

Answer 1

The distinction

#if MARKUP_SIZEOFWCHAR == 4 // sizeof(wchar_t) == 4
  char padding[2] = {'\0','\0'}; 
  dest[i] =  (padding[0] << 24) + (padding[1] << 16) + (ptr[0] << 8) + ptr[1];
#else // sizeof(wchar_t) == 2
  dest[i] = (ptr[0] << 8) + ptr[1];
#endif

is completely unnecessary. padding[i] is 0, so shifting that left keeps it 0, and adding it has no effect.

The compiler may or may not optimise the allocation of the two-byte array padding in each loop iteration away, but since it is an automatic array, it cannot leak in any way.

Since the types used in the loop are unsigned, simply using

dest[i] = (ptr[0] << 8) + ptr[1];

is perfectly safe. (The endianness must of course be correct.)

For

return (wchar_t *)dest;

you should let the type of dest depend on the size of wchar_t , it should be uint16_t* if sizeof(wchar_t) == 2 (and CHAR_BIT == 8 ).

Answer 2

What you're trying to do isn't going to work. It's broken in several ways, but let's focus on the cast.

Your question doesn't match your code. Your code is using a uint32_t , while your question asks about a uint16_t . But that doesn't matter, because neither will work .

If you need to use wchar_t , then you should actually use wchar_t . If your goal is to take two consecutive bytes of a char* and copy them into the first-two bytes of a wchar_t , then just do that.

Here is a much better version of your code, one that actually works (to the degree that it makes sense to copy two bytes from a char* and pretend that it's a wchar_t ):

std::wstring FileReader::getObjectForSizeAndCursor(int32_t size, int cursor, char *buffer) {

  int wlen = size/2;
  std::wstring out(wlen);
  unsigned char *ptr = (unsigned char *)(buffer + cursor);
  for(int i=0; i<wlen; i++) {
    out[i] = (ptr[0] << 8) + ptr[1];
    ptr += 2;
    cout << ptr;
  }
  return out;
}

Plus, there's no chance of memory leaking since we're using a proper RAII class like std::wstring .

Safe way to cast a uint16_t to a wchar_t

Question

2 answers

solution1
0 ACCPTED 2012-10-07 18:01:18

solution2
0 2012-10-07 18:10:35

Safe way to cast a uint16_t to a wchar_t

Question

2 answers

solution1 0 ACCPTED 2012-10-07 18:01:18

solution2 0 2012-10-07 18:10:35

solution1
0 ACCPTED 2012-10-07 18:01:18

solution2
0 2012-10-07 18:10:35