简体   繁体   English

C:对包含\\ 0的字符串使用strlen

[英]C : using strlen for string including \0

What I need to do is when given a text or string like 我需要做的是给文本或字符串,如

\0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer

I need to sort this string using qsort and based on the rot encoding comparison. 我需要使用qsort并根据rot编码比较对字符串进行排序。

int my_rot_conv(int c) {
  if ('a' <= tolower(c) && tolower(c) <= 'z')
    return tolower(c)+13 <= 'z' ? c+13 : c-13;
  return c;
}

int my_rot_comparison(const void *a, const void *b) {
  char* ia = (char*) a;
  char* ib = (char*) b;
  int i=0;
  ia++, ib++;
  while (i<strlen(ia)) {
    if (ia[i] == '\0' || ia[i] == '\n' || ia[i] == '\t' || ib[i] == '\0' || ib[i] == '\n' || ib[i] == '\t') {
      i++;
    }
    if (my_rot_conv(ia[i]) > my_rot_conv(ib[i])) {
      return 1;
    } else if (my_rot_conv(ia[i]) < my_rot_conv(ib[i]))
      return -1;
  }
  return 0;
}

I get to the point that I compare two string that starts with \\0, getting the -1 in the following example. 我要比较两个以\\ 0开头的字符串,在以下示例中为-1。

printf("%d \n", my_rot_comparison("\0Abbsdf\n", "\0Csdf\n"));

But this wouldn't work for a string with qsort because ia++, ib++; 但这不适用于qsort的字符串,因为ia++, ib++; does work only for one word comparison. 仅适用于一个单词比较。

char *my_arr;
my_arr = malloc(sizeof(\0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer));
strcpy(my_arr, \0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer);
qsort(my_arr, sizeof(my_arr), sizeof(char), my_rot_comparison);

and the array should be sorted like \\0Def\\n\\0Heel\\n\\0Jijer\\n\\0\\n\\tlkjer 并且该数组的排序方式应为\\0Def\\n\\0Heel\\n\\0Jijer\\n\\0\\n\\tlkjer

My question is how do I define the comparison function that works for the string that includes \\0 and \\t and \\n characters? 我的问题是如何定义适用于包含\\0\\t\\n字符的字符串的比较函数?

strlen simply cannot operate properly on a string which embeds \\0 bytes, since by definition of the function strlen considers the end of the string to be the first encountered \\0 byte at or after the beginning of the string. strlen根本无法在嵌入\\0字节的字符串上正常运行,因为根据函数的定义, strlen认为字符串的结尾是在字符串开头或之后的第一个遇到的\\0字节。

The rest of the standard C string functions are defined in the same way. 其余标准C字符串函数的定义方式相同。

This means that you have to use a different set of functions to manipulate string(-like) data that can include \\0 bytes. 这意味着您必须使用一组不同的函数来处理可能包含\\0字节的字符串(类似)数据。 You will perhaps have to write these functions yourself. 您可能必须自己编写这些功能。

Note that you will probably have to define a structure which has a length member in it, since you won't be able to rely on a particular sentinel byte (such as \\0 ) to mark the end of the string. 请注意,您可能必须定义一个包含length成员的结构,因为您将不能依赖特定的前哨字节(例如\\0 )来标记字符串的结尾。 For example: 例如:

typedef struct {
    unsigned int length;
    char bytes[];
}
MyString;

If there is some other byte (other than \\0 ) which is forbidden in your input strings, then (per commenter @Sinn) you can swap it and \\0 , and then use normal C string functions. 如果输入字符串中禁止其他字节( \\0除外),则(每个注释者@Sinn) 可以将其与\\0交换,然后使用普通的C字符串函数。 However, it is not clear whether this would work for you. 但是,尚不清楚这是否适合您。

assuming you use an extra \\0 at the end to terminate 假设您在末尾使用额外的\\0终止

int strlenzz(char*s)
{
  int length =0;
  while(!(*s==0 && *(s+1) == 0))
  {
   s++;
   length++;
  }
  return length+1
} 

Personally I'd prefer something like danfuzz's suggestion, but for the sake of listing an alternative... 就个人而言,我更喜欢danfuzz的建议,但为了列出替代方案...

You could use an escaping convention, writing functions to: 您可以使用转义约定,将函数编写为:

  • "escape" / encode, expanding embedded (but not the terminating) '\\0' /NUL to say '\\' and '0' (adopting the convention used when writing C source code string literals), and “转义” /编码,扩展嵌入式(但不终止) '\\0' / NUL以表示“ \\”和“ 0”(采用编写C源代码字符串文字时使用的约定),并且
  • another to unescape. 另一个要逃脱。

That way you can still pass them around as C strings, your qsort/rot comparison code above will work as is, but you should be very conscious that strlen(escaped_value) will return the number of bytes in the escaped representation, which won't equal the number of bytes in the unescaped value when that value embeds NULs. 这样,您仍然可以将它们作为C字符串传递,上面的qsort / rot比较代码将按原样运行,但是您应该非常清楚strlen(escaped_value)将返回转义表示中的字节数,这不会等于在未转义的值中嵌入NUL时的字节数。

For example, something like: 例如,类似:

void unescape(char* p)
{
    char* escaped_p = p;
    for ( ; *escaped_p; ++escaped_p)
    {
        if (*escaped_p == '\\')
            if (*++escaped_p == '0')
            {
               *p++ = '\0';
               continue;
            }
        *p++ = *escaped_p;
    }
    *escaped_p = '\0'; // terminate
}

Escaping is trickier, as you need some way to ensure you have enough memory in the buffer, or to malloc a new buffer - either of the logical size of the unescaped_value * 2 + 1 length as an easy-to-calculate worst-case size, or by counting the NULs needing escaping and sizing tightly to logical-size + #NULs + 1.... 转义比较棘手,因为您需要某种方式来确保缓冲区中有足够的内存,或者malloc一个新的缓冲区-unescaped_value * 2 + 1长度的逻辑大小(易于计算的最坏情况大小) ,或通过计数需要转义的NUL并将其严格调整为逻辑大小+ #NUL + 1 ....

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM