[英]What are the differences between strtok and strsep in C
Could someone explain me what differences there are between strtok()
and strsep()
?有人能解释一下
strtok()
和strsep()
之间有什么区别吗? What are the advantages and disadvantages of them?它们的优点和缺点是什么? And why would I pick one over the other one.
为什么我会选择一个而不是另一个。
One major difference between strtok()
and strsep()
is that strtok()
is standardized (by the C standard, and hence also by POSIX) but strsep()
is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). strtok()
和strsep()
之间的一个主要区别是strtok()
是标准化的(通过 C 标准,因此也通过 POSIX)但strsep()
没有标准化(通过 C 或 POSIX;它在 GNU C 中可用库,起源于 BSD)。 Thus, portable code is more likely to use strtok()
than strsep()
.因此,可移植代码更可能使用
strtok()
不是strsep()
。
Another difference is that calls to the strsep()
function on different strings can be interleaved, whereas you cannot do that with strtok()
(though you can with strtok_r()
).另一个区别是对不同字符串的
strsep()
函数的调用可以交错,而使用strtok()
则不能这样做(尽管使用strtok_r()
)。 So, using strsep()
in a library doesn't break other code accidentally, whereas using strtok()
in a library function must be documented because other code using strtok()
at the same time cannot call the library function.因此,在库中使用
strsep()
不会意外破坏其他代码,而在库函数中使用strtok()
必须记录在案,因为同时使用strtok()
其他代码无法调用库函数。
The manual page for strsep()
at kernel.org says: kernel.org 的
strsep()
手册页说:
The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.
strsep() 函数被引入作为 strtok(3) 的替代品,因为后者不能处理空字段。
Thus, the other major difference is the one highlighted by George Gaál in his answer;因此,另一个主要区别是George Gaál在他的回答中强调的一个区别;
strtok()
permits multiple delimiters between a single token, whereas strsep()
expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token. strtok()
允许单个标记之间有多个分隔符,而strsep()
期望标记之间有一个分隔符,并将相邻的分隔符解释为空标记。
Both strsep()
and strtok()
modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL '\\0'
over the separator after the end of the token). strsep()
和strtok()
修改了它们的输入字符串,并且都strsep()
您识别标记了标记结尾的分隔符(因为它们都在标记结尾之后的分隔符上写了一个 NUL '\\0'
)。
strsep()
when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.strsep()
。strtok_r()
when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).strtok_r()
时,您可以使用strtok_r()
)。strtok()
when someone threatens your life if you don't do so.strtok()
。 And you'd only use it for long enough to get you out of the life-threatening situation;strtok_r()
or strsep()
than to use strtok()
.strtok_r()
或strsep()
比使用strtok()
更好。strtok()
poisonous?strtok()
有毒? The strtok()
function is poisonous if used in a library function.如果在库函数中使用
strtok()
函数是有毒的。 If your library function uses strtok()
, it must be documented clearly.如果您的库函数使用
strtok()
,则必须清楚地记录下来。
That's because:那是因为:
strtok()
and calls your function that also uses strtok()
, you break the calling function.strtok()
并且调用您的函数也使用strtok()
,则会破坏调用函数。strtok()
, that will break your function's use of strtok()
.strtok()
的函数,这将破坏您的函数对strtok()
的使用。strtok()
at any given time — across a sequence of strtok()
calls.strtok()
— 跨一系列strtok()
调用。 The root of this problem is the saved state between calls that allows strtok()
to continue where it left off.这个问题的根源是调用之间保存的状态,它允许
strtok()
从它停止的地方继续。 There is no sensible way to fix the problem other than "do not use strtok()
".除了“不要使用
strtok()
”之外,没有其他明智的方法来解决这个问题。
strsep()
if it is available.strsep()
。strtok_r()
if it is available.strtok_r()
。strtok_s()
if it is available.strtok_s()
。strtok_s()
, but its interface is different from both strtok_r()
and Microsoft's strtok_s()
.strtok_s()
,但其接口与strtok_r()
和 Microsoft 的strtok_s()
。 BSD strsep()
: BSD
strsep()
:
char *strsep(char **stringp, const char *delim);
POSIX strtok_r()
: POSIX
strtok_r()
:
char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state);
Microsoft strtok_s()
:微软
strtok_s()
:
char *strtok_s(char *strToken, const char *strDelimit, char **context);
Annex K strtok_s()
:附件 K
strtok_s()
:
char *strtok_s(char * restrict s1, rsize_t * restrict s1max,
const char * restrict s2, char ** restrict ptr);
Note that this has 4 arguments, not 3 as in the other two variants on strtok()
.请注意,这有 4 个参数,而不是
strtok()
上的其他两个变体中的 3 个参数。
From The GNU C Library manual - Finding Tokens in a String :来自 GNU C 库手册 - 在字符串中查找标记:
One difference between
strsep
andstrtok_r
is that if the input string contains more than one character from delimiter in a rowstrsep
returns an empty string for each pair of characters from delimiter.strsep
和strtok_r
之间的一个区别是,如果输入字符串在一行中包含来自分隔符的多个字符,则strsep
为来自分隔符的每一对字符返回一个空字符串。 This means that a program normally should test forstrsep
returning an empty string before processing it.这意味着程序通常应该在处理之前测试
strsep
返回空字符串。
First difference in strtok()
and strsep()
is the way they handle contiguous delimiter characters in the input string. strtok()
和strsep()
第一个区别是它们处理输入字符串中连续分隔符的方式。
Contiguous delimiter characters handling by strtok()
: strtok()
处理的连续分隔符:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void) {
const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
const char* delims = " -"; // delimiters - space and hyphen character
char* token;
char* ptr = strdup(teststr);
if (ptr == NULL) {
fprintf(stderr, "strdup failed");
exit(EXIT_FAILURE);
}
printf ("Original String: %s\n", ptr);
token = strtok (ptr, delims);
while (token != NULL) {
printf("%s\n", token);
token = strtok (NULL, delims);
}
printf ("Original String: %s\n", ptr);
free (ptr);
return 0;
}
Output:输出:
# ./example1_strtok
Original String: aaa-bbb --ccc-ddd
aaa
bbb
ccc
ddd
Original String: aaa
In the output, you can see the token "bbb"
and "ccc"
one after another.在输出中,您可以看到一个又一个的标记
"bbb"
和"ccc"
。 strtok()
does not indicate the occurrence of contiguous delimiter characters . strtok()
不指示连续分隔符的出现。 Also, the strtok()
modify the input string .此外,
strtok()
修改输入字符串。
Contiguous delimiter characters handling by strsep()
:由
strsep()
处理的连续分隔符字符:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void) {
const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
const char* delims = " -"; // delimiters - space and hyphen character
char* token;
char* ptr1;
char* ptr = strdup(teststr);
if (ptr == NULL) {
fprintf(stderr, "strdup failed");
exit(EXIT_FAILURE);
}
ptr1 = ptr;
printf ("Original String: %s\n", ptr);
while ((token = strsep(&ptr1, delims)) != NULL) {
if (*token == '\0') {
token = "<empty>";
}
printf("%s\n", token);
}
if (ptr1 == NULL) // This is just to show that the strsep() modifies the pointer passed to it
printf ("ptr1 is NULL\n");
printf ("Original String: %s\n", ptr);
free (ptr);
return 0;
}
Output:输出:
# ./example1_strsep
Original String: aaa-bbb --ccc-ddd
aaa
bbb
<empty> <==============
<empty> <==============
ccc
ddd
ptr1 is NULL
Original String: aaa
In the output, you can see the two empty string (indicated through <empty>
) between bbb
and ccc
.在输出中,您可以看到
bbb
和ccc
之间的两个空字符串(通过<empty>
表示)。 Those two empty strings are for "--"
between "bbb"
and "ccc"
.这两个空字符串用于
"--"
介于"bbb"
和"ccc"
。 When strsep()
found a delimiter character ' '
after "bbb"
, it replaced delimiter character with '\\0'
character and returned "bbb"
.当
strsep()
在"bbb"
之后发现分隔符' '
,它将分隔符替换为'\\0'
字符并返回"bbb"
。 After this, strsep()
found another delimiter character '-'
.在此之后,
strsep()
找到了另一个分隔符'-'
。 Then it replaced delimiter character with '\\0'
character and returned the empty string.然后它将分隔符替换为
'\\0'
字符并返回空字符串。 Same is for the next delimiter character.下一个分隔符也是如此。
Contiguous delimiter characters are indicated when strsep()
returns a pointer to a null character (that is, a character with the value '\\0'
).当
strsep()
返回指向空字符(即,值为'\\0'
的字符strsep()
的指针时,将指示连续的分隔符字符。
The strsep()
modify the input string as well as the pointer whose address passed as first argument to strsep()
. strsep()
修改输入字符串以及其地址作为第一个参数传递给strsep()
的指针。
Second difference is, strtok()
relies on a static variable to keep track of the current parse location within a string.第二个区别是,
strtok()
依赖于一个静态变量来跟踪字符串中的当前解析位置。 This implementation requires to completely parse one string before beginning a second string .此实现需要在开始第二个字符串之前完全解析一个字符串。 But this is not the case with
strsep()
.但
strsep()
并非如此。
Calling strtok()
when another strtok()
is not finished:当另一个
strtok()
未完成时调用strtok()
:
#include <stdio.h>
#include <string.h>
void another_function_callng_strtok(void)
{
char str[] ="ttt -vvvv";
char* delims = " -";
char* token;
printf ("Original String: %s\n", str);
token = strtok (str, delims);
while (token != NULL) {
printf ("%s\n", token);
token = strtok (NULL, delims);
}
printf ("another_function_callng_strtok: I am done.\n");
}
void function_callng_strtok ()
{
char str[] ="aaa --bbb-ccc";
char* delims = " -";
char* token;
printf ("Original String: %s\n", str);
token = strtok (str, delims);
while (token != NULL)
{
printf ("%s\n",token);
another_function_callng_strtok();
token = strtok (NULL, delims);
}
}
int main(void) {
function_callng_strtok();
return 0;
}
Output:输出:
# ./example2_strtok
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
vvvv
another_function_callng_strtok: I am done.
The function function_callng_strtok()
only print token "aaa"
and does not print the rest of the tokens of input string because it calls another_function_callng_strtok()
which in turn call strtok()
and it set the static pointer of strtok()
to NULL
when it finishes with extracting all the tokens.该功能
function_callng_strtok()
只打印令牌"aaa"
,不打印输入字符串的令牌的休息,因为它调用another_function_callng_strtok()
这又调用strtok()
并将其设置的静态指针strtok()
以NULL
时完成提取所有令牌。 The control comes back to function_callng_strtok()
while
loop, strtok()
returns NULL
due to the static pointer pointing to NULL
and which make the loop condition false
and loop exits.控制回来
function_callng_strtok()
while
循环, strtok()
返回NULL
由于静态指针指向NULL
并使得循环条件false
和退出循环。
Calling strsep()
when another strsep()
is not finished:当另一个
strsep()
未完成时调用strsep()
:
#include <stdio.h>
#include <string.h>
void another_function_callng_strsep(void)
{
char str[] ="ttt -vvvv";
const char* delims = " -";
char* token;
char* ptr = str;
printf ("Original String: %s\n", str);
while ((token = strsep(&ptr, delims)) != NULL) {
if (*token == '\0') {
token = "<empty>";
}
printf("%s\n", token);
}
printf ("another_function_callng_strsep: I am done.\n");
}
void function_callng_strsep ()
{
char str[] ="aaa --bbb-ccc";
const char* delims = " -";
char* token;
char* ptr = str;
printf ("Original String: %s\n", str);
while ((token = strsep(&ptr, delims)) != NULL) {
if (*token == '\0') {
token = "<empty>";
}
printf("%s\n", token);
another_function_callng_strsep();
}
}
int main(void) {
function_callng_strsep();
return 0;
}
Output:输出:
# ./example2_strsep
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
bbb
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
ccc
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
Here you can see, calling strsep()
before completely parse one string doesn't makes any difference.在这里您可以看到,在完全解析一个字符串之前调用
strsep()
没有任何区别。
So, the disadvantage of strtok()
and strsep()
is that both modify the input string but strsep()
has couple of advantages over strtok()
as illustrated above.因此,
strtok()
和strsep()
的缺点是都修改输入字符串,但strsep()
比strtok()
有几个优点,如上所示。
The strsep() function is intended as a replacement for the strtok() function.
strsep() 函数旨在替代 strtok() 函数。 While the strtok() function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990 (``ISO C90'')) it is unable to handle empty fields, ie, detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time.
虽然 strtok() 函数出于可移植性的原因应该是首选(它符合 ISO/IEC 9899:1990 (``ISO C90'')),但它无法处理空字段,即检测由两个相邻分隔符分隔的字段,或一次用于多个字符串。 The strsep() function first appeared in 4.4BSD.
strsep() 函数首次出现在 4.4BSD 中。
For reference:以供参考:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.