简体   繁体   English

带有多个分隔符的 Strsep:奇怪的结果

[英]Strsep with Multiple Delimiters: Strange result

I am currently having some strange results when using strsep with multiple delimiters.使用带有多个分隔符的strsep时,我目前有一些奇怪的结果。 My delimiters include the TAB character, the space character, as well as > and < .我的分隔符包括 TAB 字符、空格字符以及><

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()
{
    char buffer[50];
    char *curr_str = NULL;
    const char delim[4] = "\t >";
    //const char delim[4] = "\t ><"; // This does not work
  
    snprintf(buffer, 50, "%s", "echo Hello");
  
    char *str_ptr = buffer;
  
    curr_str = strsep(&str_ptr, delim);
  
    if (curr_str != NULL)
        printf("%s\n", curr_str);

    curr_str = strsep(&str_ptr, delim);
    if (curr_str != NULL)
        printf("%s\n", curr_str);
    return (0);
}

This output is what I expect.这个 output 是我所期望的。

echo 
Hello

However, as soon as I add the '<' character for the delimiter, I get但是,只要我为分隔符添加 '<' 字符,我就会得到

cho

Somehow, the first character gets cut off.不知何故,第一个字符被切断了。 Is there a reason behind why this is occurring?发生这种情况是否有原因?

Thank you.谢谢你。

The second argument to strsep , delim is a null-terminated string (like all strings in C), so you have to leave space for the terminating character: strsep的第二个参数delim是一个以 null 结尾的字符串(就像 C 中的所有字符串一样),因此您必须为终止字符留出空间:

const char delim[5] = "\t ><"; // This does work
//const char delim[] = "\t ><"; // or this

If you don't end the string, it will go exploring memory past the array and find many new delimiting characters to use, which is what happened in your case.如果您不结束字符串,它将 go 探索 memory 穿过数组并找到许多要使用的新分隔字符,这就是您的情况。

"...the first character gets cut off. is there a reason behind why this is occurring?" “......第一个字符被切断了。发生这种情况有什么原因吗?”

Yes, undefined behavior caused by a non-null terminated char array being used in a C string function.是的,由在 C 字符串 function 中使用的非空终止字符数组引起的未定义行为

If when populated const char delim[4] does not contain a null termination, it will be just a char array, but not a C string .如果填充时const char delim[4]不包含 null 终止,则它将只是一个char数组,而不是C string It may or may not exhibit Strange behavior, but it will invoke undefined behavior if used with any of the C string functions (such as curr_str = strsep(&str_ptr,delim); .它可能会或可能不会表现出奇怪的行为,但如果与任何C 字符串函数(例如curr_str = strsep(&str_ptr,delim);使用,它将调用未定义的行为

const char delim[4];

Has room for 4 char.有 4 个字符的空间。

"\t ><"  //contains exactly 4 char

can be conceptualized like this in memory:在 memory 中可以这样概念化:

|\t| |>|<|?|?|?|  // ? = unknown content, possibly no null termination
         ^end of owned memory

It should contain the following:它应包含以下内容:

|\t| |>|<|\0|?|?|  // null termination  
            ^end of owned memory (5 char wide)

requiring more room in the declaration, for example one of the two following options:在声明中需要更多空间,例如以下两个选项之一:

const char delim[5] = "\t ><";

or或者

const char delim[] = "\t ><";

const char delim[4] = "\t ><"; does not define a proper C string because there is no space for the null terminator.没有定义正确的 C 字符串,因为 null 终止符没有空间。 Hence any non zero bytes following delim in memory will be part of the delimiter string.因此,memory 中delim之后的任何非零字节都将成为分隔符字符串的一部分。

This is of course undefined behavior , and in your case the compiler may position delim just before buffer without any padding, effectively continuing the sequence of delimiter characters with all characters from the string "echo Hello" .这当然是未定义的行为,在您的情况下,编译器可能在buffer之前 position delim没有任何填充,有效地继续使用字符串"echo Hello"中的所有字符的定界符序列。 This causes the first call to strsep to return an empty string.这会导致第一次调用strsep返回一个空字符串。

You can check on this Godbolt instance that it is indeed the case in 32-bit mode, but not in 64-bit mode (remove the -m32 compiler option).您可以检查这个Godbolt 实例,在 32 位模式下确实是这种情况,但在 64 位模式下却不是(删除-m32编译器选项)。

This problem is easy to fix.这个问题很容易解决。 You can either let the compiler determine the length of the delim array:您可以让编译器确定delim数组的长度:

const char delim[] = "\t ><";

or you can use a pointer to a string constant:或者您可以使用指向字符串常量的指针:

const char *delim = "\t ><";

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM