简体   繁体   中英

C : using strlen for string including \0

What I need to do is when given a text or string like

\0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer

I need to sort this string using qsort and based on the rot encoding comparison.

int my_rot_conv(int c) {
  if ('a' <= tolower(c) && tolower(c) <= 'z')
    return tolower(c)+13 <= 'z' ? c+13 : c-13;
  return c;
}

int my_rot_comparison(const void *a, const void *b) {
  char* ia = (char*) a;
  char* ib = (char*) b;
  int i=0;
  ia++, ib++;
  while (i<strlen(ia)) {
    if (ia[i] == '\0' || ia[i] == '\n' || ia[i] == '\t' || ib[i] == '\0' || ib[i] == '\n' || ib[i] == '\t') {
      i++;
    }
    if (my_rot_conv(ia[i]) > my_rot_conv(ib[i])) {
      return 1;
    } else if (my_rot_conv(ia[i]) < my_rot_conv(ib[i]))
      return -1;
  }
  return 0;
}

I get to the point that I compare two string that starts with \\0, getting the -1 in the following example.

printf("%d \n", my_rot_comparison("\0Abbsdf\n", "\0Csdf\n"));

But this wouldn't work for a string with qsort because ia++, ib++; does work only for one word comparison.

char *my_arr;
my_arr = malloc(sizeof(\0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer));
strcpy(my_arr, \0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer);
qsort(my_arr, sizeof(my_arr), sizeof(char), my_rot_comparison);

and the array should be sorted like \\0Def\\n\\0Heel\\n\\0Jijer\\n\\0\\n\\tlkjer

My question is how do I define the comparison function that works for the string that includes \\0 and \\t and \\n characters?

strlen simply cannot operate properly on a string which embeds \\0 bytes, since by definition of the function strlen considers the end of the string to be the first encountered \\0 byte at or after the beginning of the string.

The rest of the standard C string functions are defined in the same way.

This means that you have to use a different set of functions to manipulate string(-like) data that can include \\0 bytes. You will perhaps have to write these functions yourself.

Note that you will probably have to define a structure which has a length member in it, since you won't be able to rely on a particular sentinel byte (such as \\0 ) to mark the end of the string. For example:

typedef struct {
    unsigned int length;
    char bytes[];
}
MyString;

If there is some other byte (other than \\0 ) which is forbidden in your input strings, then (per commenter @Sinn) you can swap it and \\0 , and then use normal C string functions. However, it is not clear whether this would work for you.

assuming you use an extra \\0 at the end to terminate

int strlenzz(char*s)
{
  int length =0;
  while(!(*s==0 && *(s+1) == 0))
  {
   s++;
   length++;
  }
  return length+1
} 

Personally I'd prefer something like danfuzz's suggestion, but for the sake of listing an alternative...

You could use an escaping convention, writing functions to:

  • "escape" / encode, expanding embedded (but not the terminating) '\\0' /NUL to say '\\' and '0' (adopting the convention used when writing C source code string literals), and
  • another to unescape.

That way you can still pass them around as C strings, your qsort/rot comparison code above will work as is, but you should be very conscious that strlen(escaped_value) will return the number of bytes in the escaped representation, which won't equal the number of bytes in the unescaped value when that value embeds NULs.

For example, something like:

void unescape(char* p)
{
    char* escaped_p = p;
    for ( ; *escaped_p; ++escaped_p)
    {
        if (*escaped_p == '\\')
            if (*++escaped_p == '0')
            {
               *p++ = '\0';
               continue;
            }
        *p++ = *escaped_p;
    }
    *escaped_p = '\0'; // terminate
}

Escaping is trickier, as you need some way to ensure you have enough memory in the buffer, or to malloc a new buffer - either of the logical size of the unescaped_value * 2 + 1 length as an easy-to-calculate worst-case size, or by counting the NULs needing escaping and sizing tightly to logical-size + #NULs + 1....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM