简体   繁体   English

使用C中包含多个Null字符的字符串

[英]Working with strings in C that contain multiple Null characters

I have a char[] buffer that contains data read from the memory, which means there are multiple Nulls within this buffer and not only one single terminating Null at the end. 我有一个char[]缓冲区,其中包含从内存中读取的数据,这意味着此缓冲区中有多个Null,而不只是一个终止的Null。

char *addr = (char *)memmem(buff, strlen(buff), needle, strlen(needle));

I need to process this string and search and replace certain content. 我需要处理此字符串并搜索并替换某些内容。 For this, I would like to use memmem() , but unfortunately, it's not working because the multiple Nulls within the buffer. 为此,我想使用memmem() ,但不幸的是,它不起作用,因为缓冲区内有多个Null。

I tried to replace the Nulls in the buffer with a character from which I thought it would not appear (the Bell ASCII code \\7) and then process it, but this does not work and corrupts my data. 我试图用一个我认为不会出现的字符(Bell ASCII代码\\ 7)替换缓冲区中的Null然后处理它,但这不起作用并破坏我的数据。

Is there another way to process a string that has multiple Nulls? 是否有另一种方法来处理具有多个Null的字符串?

memmem can be used to do it. memmem可以用来做它。 However, since your data contains embedded nulls you cannot use strlen to calculate the length. 但是,由于您的数据包含嵌入的空值,因此无法使用strlen来计算长度。 You need to keep track of the lengths separately. 您需要分别跟踪长度。

char *addr = (char *)memmem(buff, buffLen, needle, needleLen);

Using strlen on your char array that, by your own admission contains terminating isn't going to work. 在你的char数组上使用strlen ,你自己承认包含终止是行不通的。 You should realize that strlen just counts the number of chars up to the point where it encounters a zero terminator. 您应该意识到strlen只计算字符数,直到它遇到零终止符。
Incidentally, this is how strncat works, too, so you can't use those functions either. 顺便说一句,这也是strncat工作原理,所以你也不能使用这些功能。

What you could do is keep an int at hand which you use to keep track of the actual length of the string. 你可以做的是保留一个用于跟踪字符串实际长度的int
An other way of tackling this problem would be to write your own d_strlen function, and always ensure that your buffer has two zero-terminating chars at the end: 解决这个问题的另一种方法是编写自己的d_strlen函数,并始终确保缓冲区末尾有两个零终止字符:

size_t d_strlen(const char *in)
{
    size_t len = 0;
    while( !(in[len] == '\0' && in[len+1] == '\0' )) ++len;
    return len;
}

Again: your buffer Must then be terminated by two terminating zeroes, not just the one: 再次:你的buffer 必须两个终止零终止,而不仅仅是一个:

char buff[100] = "this \0 string contains \0 terminators \0";//adds second \0
printf("%d  != %d\n", strlen(buff), d_strlen(buff));//yields 5 != 37

As Frerich Raabe pointed out, to not loose the optimizations and security of a tried 'n tested function like strlen , this might be a better version of d_strlen 正如Frerich Raabe指出的那样,为了不放弃strlen这样经过试验的测试函数的优化和安全性,这可能是d_strlen的更好版本

size_t d_strlen(const char *in)
{
    size_t len = 0;
    while(strlen(in+len)) len += strlen(in+len) + 1;
    return len ? --len : 0;//check for zero-length
}

However, this calls strlen twice, which is pointless overhead, so you're probably better off writing this: 但是,这会调用strlen两次,这是无意义的开销,所以你最好写这个:

size_t d_strlen(const char *in)
{
    size_t i, len = 0;
    do
    {
        i = strlen(in+len);//get substring length
        len += i + 1;//add to total length + 1 for \0 char
    }while(i > 0);
    return len > 1 ? len-2 : 0;//subtract 2, if possible, else return 0
}

AC string is a sequence of chars terminated by a NUL ... by definition it can't contain more than one NUL. AC字符串是由NUL终止的字符序列...根据定义,它不能包含多个NUL。 What you have is something more generic, a block of chars. 你拥有的是更通用的东西,一块字符。 To deal with arbitrary blocks of chars, they must be accompanied by a length -- the count of the number of chars. 为了处理任意的字符块,它们必须伴随一个长度 - 字符数的计数。 Given that you can process the block without any concern for whether it contains NULs. 鉴于您可以处理块而不关心它是否包含NUL。 Given a block consisting of a pointer to some chars and the count, you would use the count, rather than strlen, so 给定一个由指向某些字符和计数的指针组成的块,你可以使用count而不是strlen

memmem(buff, buff_len, needle, needle_len);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM