简体   繁体   English

strtok() 如何将字符串拆分为 C 中的标记?

[英]How does strtok() split the string into tokens in C?

Please explain to me the working of strtok() function.请向我解释strtok()函数的工作strtok() The manual says it breaks the string into tokens.手册说它将字符串分解为标记。 I am unable to understand from the manual what it actually does.我无法从手册中理解它的实际作用。

I added watches on str and *pch to check its working when the first while loop occurred, the contents of str were only "this".我在str*pch上添加了*pch以检查它在第一个 while 循环发生时的工作情况, str的内容只是“this”。 How did the output shown below printed on the screen?下面显示的输出是如何打印在屏幕上的?

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

Output:输出:

Splitting string "- This, a sample string." into tokens:
This
a
sample
string

the strtok runtime function works like this strtok 运行时函数是这样工作的

the first time you call strtok you provide a string that you want to tokenize第一次调用 strtok 时,您提供了一个要标记的字符串

char s[] = "this is a string";

in the above string space seems to be a good delimiter between words so lets use that:在上面的字符串空间中似乎是单词之间的一个很好的分隔符,所以让我们使用它:

char* p = strtok(s, " ");

what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)现在发生的是搜索 's' 直到找到空格字符,返回第一个标记('this')并且 p 指向该标记(字符串)

in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static pointer to your previous passed string:为了获得下一个标记并继续使用相同的字符串 NULL 作为第一个参数传递,因为 strtok 维护一个指向您之前传递的字符串的静态指针

p = strtok(NULL," ");

p now points to 'is' p 现在指向“是”

and so on until no more spaces can be found, then the last string is returned as the last token 'string'.依此类推,直到找不到更多空格,然后最后一个字符串作为最后一个标记“字符串”返回。

more conveniently you could write it like this instead to print out all tokens:更方便的是,您可以这样写,而不是打印出所有标记:

for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
  puts(p);
}

EDIT:编辑:

If you want to store the returned values from strtok you need to copy the token to another buffer eg strdup(p);如果你想存储从strtok返回的值,你需要将令牌复制到另一个缓冲区,例如strdup(p); since the original string (pointed to by the static pointer inside strtok ) is modified between iterations in order to return the token.因为原始字符串(由strtok的静态指针指向)在迭代之间被修改以返回令牌。

strtok() divides the string into tokens. strtok()将字符串划分为标记。 ie starting from any one of the delimiter to next one would be your one token.即从任何一个分隔符开始到下一个将是您的一个标记。 In your case, the starting token will be from "-" and end with next space " ".在您的情况下,起始标记将来自“-”并以下一个空格“”结束。 Then next token will start from " " and end with ",".然后下一个标记将从“”开始并以“,”结束。 Here you get "This" as output.在这里,您将获得“This”作为输出。 Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."类似地,字符串的其余部分从一个空格到另一个空格拆分为标记,最后以“。”结束最后一个标记。

strtok maintains a static, internal reference pointing to the next available token in the string; strtok维护一个指向字符串中下一个可用标记的静态内部引用; if you pass it a NULL pointer, it will work from that internal reference.如果您传递给它一个 NULL 指针,它将从该内部引用工作。

This is the reason strtok isn't re-entrant;这就是strtok不可重入的原因; as soon as you pass it a new pointer, that old internal reference gets clobbered.一旦你传递一个新的指针,旧的内部引用就会被破坏。

strtok doesn't change the parameter itself ( str ). strtok不会更改参数本身 ( str )。 It stores that pointer (in a local static variable).它存储该指针(在局部静态变量中)。 It can then change what that parameter points to in subsequent calls without having the parameter passed back.然后,它可以在后续调用中更改该参数指向的内容,而无需将参数传回。 (And it can advance that pointer it has kept however it needs to perform its operations.) (并且它可以推进它保留的指针,但它需要执行其操作。)

From the POSIX strtok page:从 POSIX strtok页面:

This function uses static storage to keep track of the current string position between calls.此函数使用静态存储来跟踪调用之间的当前字符串位置。

There is a thread-safe variant ( strtok_r ) that doesn't do this type of magic.有一个线程安全的变体 ( strtok_r ) 不会做这种魔法。

strtok will tokenize a string ie convert it into a series of substrings. strtok 将标记一个字符串,即将其转换为一系列子字符串。

It does that by searching for delimiters that separate these tokens (or substrings).它通过搜索分隔这些标记(或子字符串)的分隔符来实现。 And you specify the delimiters.并且您指定分隔符。 In your case, you want ' ' or ',' or '.'在您的情况下,您需要 ' ' 或 ',' 或 '.' or '-' to be the delimiter.或“-”作为分隔符。

The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters.提取这些标记的编程模型是您手动 strtok 主字符串和分隔符集。 Then you call it repeatedly, and each time strtok will return the next token it finds.然后你反复调用它,每次 strtok 都会返回它找到的下一个标记。 Till it reaches the end of the main string, when it returns a null.直到它到达主字符串的末尾,当它返回空值时。 Another rule is that you pass the string in only the first time, and NULL for the subsequent times.另一个规则是您只在第一次传入字符串,而在随后的时间则传入 NULL。 This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session.这是一种告诉 strtok 是使用新字符串开始新的标记化会话,还是从之前的标记化会话中检索标记的方法。 Note that strtok remembers its state for the tokenizing session.请注意, strtok 会记住其标记会话的状态。 And for this reason it is not reentrant or thread safe (you should be using strtok_r instead).因此,它不是可重入的或线程安全的(您应该使用 strtok_r 代替)。 Another thing to know is that it actually modifies the original string.要知道的另一件事是它实际上修改了原始字符串。 It writes '\\0' for teh delimiters that it finds.它为找到的分隔符写入 '\\0' 。

One way to invoke strtok, succintly, is as follows:简单地说,调用 strtok 的一种方法如下:

char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;

for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
    printf("token=%s\n", token);
}

Result:结果:

this
is
the
string
I
want
to
parse

The first time you call it, you provide the string to tokenize to strtok .第一次调用它时,您提供要标记化的字符串strtok And then, to get the following tokens, you just give NULL to that function, as long as it returns a non NULL pointer.然后,要获得以下标记,您只需为该函数提供NULL ,只要它返回一个非NULL指针。

The strtok function records the string you first provided when you call it. strtok函数记录您在调用它时首先提供的字符串。 (Which is really dangerous for multi-thread applications) (这对于多线程应用程序来说真的很危险)

strtok modifies its input string. strtok 修改其输入字符串。 It places null characters ('\\0') in it so that it will return bits of the original string as tokens.它在其中放置空字符 ('\\0'),以便将原始字符串的位作为标记返回。 In fact strtok does not allocate memory.实际上 strtok 不分配内存。 You may understand it better if you draw the string as a sequence of boxes.如果将字符串绘制为一系列框,您可能会更好地理解它。

To understand how strtok() works, one first need to know what a static variable is.要了解strtok()工作原理,首先需要知道什么是静态变量 This link explains it quite well....这个链接很好地解释了它......

The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls).. strtok()操作的关键是在连续调用之间保留最后一个分隔符的位置(这就是为什么strtok()在连续调用中使用null pointer调用时继续解析传递给它的原始字符串的原因) ..

Have a look at my own strtok() implementation, called zStrtok() , which has a sligtly different functionality than the one provided by strtok()看看我自己的strtok()实现,称为zStrtok() ,它的功能与strtok()提供的功能略有不同

char *zStrtok(char *str, const char *delim) {
    static char *static_str=0;      /* var to store last address */
    int index=0, strlength=0;           /* integers for indexes */
    int found = 0;                  /* check if delim is found */

    /* delimiter cannot be NULL
    * if no more char left, return NULL as well
    */
    if (delim==0 || (str == 0 && static_str == 0))
        return 0;

    if (str == 0)
        str = static_str;

    /* get length of string */
    while(str[strlength])
        strlength++;

    /* find the first occurance of delim */
    for (index=0;index<strlength;index++)
        if (str[index]==delim[0]) {
            found=1;
            break;
        }

    /* if delim is not contained in str, return str */
    if (!found) {
        static_str = 0;
        return str;
    }

    /* check for consecutive delimiters
    *if first char is delim, return delim
    */
    if (str[0]==delim[0]) {
        static_str = (str + 1);
        return (char *)delim;
    }

    /* terminate the string
    * this assignmetn requires char[], so str has to
    * be char[] rather than *char
    */
    str[index] = '\0';

    /* save the rest of the string */
    if ((str + index + 1)!=0)
        static_str = (str + index + 1);
    else
        static_str = 0;

        return str;
}

And here is an example usage这是一个示例用法

  Example Usage
      char str[] = "A,B,,,C";
      printf("1 %s\n",zStrtok(s,","));
      printf("2 %s\n",zStrtok(NULL,","));
      printf("3 %s\n",zStrtok(NULL,","));
      printf("4 %s\n",zStrtok(NULL,","));
      printf("5 %s\n",zStrtok(NULL,","));
      printf("6 %s\n",zStrtok(NULL,","));

  Example Output
      1 A
      2 B
      3 ,
      4 ,
      5 C
      6 (null)

The code is from a string processing library I maintain on Github , called zString.代码来自我在 Github 上维护的字符串处理库,称为 zString。 Have a look at the code, or even contribute :) https://github.com/fnoyanisi/zString看看代码,甚至贡献:) https://github.com/fnoyanisi/zString

This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked.这就是我实现 strtok 的方式,不是很好,但在它工作了 2 小时后终于开始工作了。 It does support multiple delimiters.它确实支持多个分隔符。

#include "stdafx.h"
#include <iostream>
using namespace std;

char* mystrtok(char str[],char filter[]) 
{
    if(filter == NULL) {
        return str;
    }
    static char *ptr = str;
    static int flag = 0;
    if(flag == 1) {
        return NULL;
    }
    char* ptrReturn = ptr;
    for(int j = 0; ptr != '\0'; j++) {
        for(int i=0 ; filter[i] != '\0' ; i++) {
            if(ptr[j] == '\0') {
                flag = 1;
                return ptrReturn;
            }
            if( ptr[j] == filter[i]) {
                ptr[j] = '\0';
                ptr+=j+1;
                return ptrReturn;
            }
        }
    }
    return NULL;
}

int _tmain(int argc, _TCHAR* argv[])
{
    char str[200] = "This,is my,string.test";
    char *ppt = mystrtok(str,", .");
    while(ppt != NULL ) {
        cout<< ppt << endl;
        ppt = mystrtok(NULL,", ."); 
    }
    return 0;
}

For those who are still having hard time understanding this strtok() function, take a look at this pythontutor example , it is a great tool to visualize your C (or C++, Python ...) code.对于那些仍然很难理解这个strtok()函数的人,看看这个pythontutor 示例,它是一个很好的工具,可以可视化你的 C(或 C++、Python ...)代码。

In case the link got broken, paste in:如果链接损坏,请粘贴:

#include <stdio.h>
#include <string.h>

int main()
{
    char s[] = "Hello, my name is? Matthew! Hey.";
    char* p;
    for (char *p = strtok(s," ,?!."); p != NULL; p = strtok(NULL, " ,?!.")) {
      puts(p);
    }
    return 0;
}

Credits go to Anders K.学分归Anders K.

Here is my implementation which uses hash table for the delimiter, which means it O(n) instead of O(n^2) (here is a link to the code) :这是我的实现,它使用哈希表作为分隔符,这意味着它是 O(n) 而不是 O(n^2) (这是代码的链接)

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

#define DICT_LEN 256

int *create_delim_dict(char *delim)
{
    int *d = (int*)malloc(sizeof(int)*DICT_LEN);
    memset((void*)d, 0, sizeof(int)*DICT_LEN);

    int i;
    for(i=0; i< strlen(delim); i++) {
        d[delim[i]] = 1;
    }
    return d;
}



char *my_strtok(char *str, char *delim)
{

    static char *last, *to_free;
    int *deli_dict = create_delim_dict(delim);

    if(!deli_dict) {
        /*this check if we allocate and fail the second time with entering this function */
        if(to_free) {
            free(to_free);
        }
        return NULL;
    }

    if(str) {
        last = (char*)malloc(strlen(str)+1);
        if(!last) {
            free(deli_dict);
            return NULL;
        }
        to_free = last;
        strcpy(last, str);
    }

    while(deli_dict[*last] && *last != '\0') {
        last++;
    }
    str = last;
    if(*last == '\0') {
        free(deli_dict);
        free(to_free);
        deli_dict = NULL;
        to_free = NULL;
        return NULL;
    }
    while (*last != '\0' && !deli_dict[*last]) {
        last++;
    }

    *last = '\0';
    last++;

    free(deli_dict);
    return str;
}

int main()
{
    char * str = "- This, a sample string.";
    char *del = " ,.-";
    char *s = my_strtok(str, del);
    while(s) {
        printf("%s\n", s);
        s = my_strtok(NULL, del);
    }
    return 0;
}

strtok replaces the characters in the second argument with a NULL and a NULL character is also the end of a string. strtok 用 NULL 替换第二个参数中的字符,NULL 字符也是字符串的结尾。

http://www.cplusplus.com/reference/clibrary/cstring/strtok/ http://www.cplusplus.com/reference/clibrary/cstring/strtok/

strtok() stores the pointer in static variable where did you last time left off , so on its 2nd call , when we pass the null , strtok() gets the pointer from the static variable . strtok() 将指针存储在您上次离开的静态变量中,因此在第二次调用时,当我们传递 null 时,strtok() 从静态变量中获取指针。

If you provide the same string name , it again starts from beginning.如果您提供相同的字符串 name ,它将再次从头开始。

Moreover strtok() is destructive ie it make changes to the orignal string.此外, strtok() 是破坏性的,即它会更改原始字符串。 so make sure you always have a copy of orignal one.所以请确保您始终拥有一份原始副本。

One more problem of using strtok() is that as it stores the address in static variables , in multithreaded programming calling strtok() more than once will cause an error.使用 strtok() 的另一个问题是,由于它将地址存储在静态变量中,因此在多线程编程中多次调用 strtok() 会导致错误。 For this use strtok_r().为此使用 strtok_r()。

The below is the link how strtok works in C, and the below is the same program what Dipak explained in the C++ 下面是strtok如何在C中工作的链接,下面是Dipak在C ++中解释的相同程序

https://stackoverflow.com/a/58674383/7802396 https://stackoverflow.com/a/58674383/7802396

you can scan the char array looking for the token if you found it just print new line else print the char.如果您发现它只是打印新行,则可以扫描字符数组以查找令牌,否则打印字符。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    char *s;
    s = malloc(1024 * sizeof(char));
    scanf("%[^\n]", s);
    s = realloc(s, strlen(s) + 1);
    int len = strlen(s);
    char delim =' ';
    for(int i = 0; i < len; i++) {
        if(s[i] == delim) {
            printf("\n");
        }
        else {
            printf("%c", s[i]);
        }
    }
    free(s);
    return 0;
}

So, this is a code snippet to help better understand this topic.所以,这是一个有助于更好地理解这个主题的代码片段。

Printing Tokens打印令牌

Task: Given a sentence, s, print each word of the sentence in a new line.任务:给定一个句子 s,在一个新行中打印句子中的每个单词。

char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
//logic to print the tokens of the sentence.
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
    printf("%s\n",p);
}

Input: How is that输入:那How is that

Result:结果:

How
is
that

Explanation: So here, "strtok()" function is used and it's iterated using for loop to print the tokens in separate lines.说明:因此,这里使用了“strtok()”函数,并使用 for 循环对其进行迭代以在单独的行中打印标记。

The function will take parameters as 'string' and 'break-point' and break the string at those break-points and form tokens.该函数将参数作为“字符串”和“断点”,并在这些断点处断开字符串并形成标记。 Now, those tokens are stored in 'p' and are used further for printing.现在,这些标记存储在“p”中并进一步用于打印。

strtok is replacing delimiter with '\\0' NULL character in given string strtok正在用给定字符串中的'\\0' NULL 字符替换分隔符

CODE代码

#include<iostream>
#include<cstring>

int main()
{
    char s[]="30/4/2021";     
    std::cout<<(void*)s<<"\n";    // 0x70fdf0
    
    char *p1=(char*)0x70fdf0;
    std::cout<<p1<<"\n";
    
    char *p2=strtok(s,"/");
    std::cout<<(void*)p2<<"\n";
    std::cout<<p2<<"\n";
    
    char *p3=(char*)0x70fdf0;
    std::cout<<p3<<"\n";
    
    for(int i=0;i<=9;i++)
    {
        std::cout<<*p1;
        p1++;
    }
    
}

OUTPUT输出

0x70fdf0       // 1. address of string s
30/4/2021      // 2. print string s through ptr p1 
0x70fdf0       // 3. this address is return by strtok to ptr p2
30             // 4. print string which pointed by p2
30             // 5. again assign address of string s to ptr p3 try to print string
30 4/2021      // 6. print characters of string s one by one using loop

Before tokenizing the string在标记字符串之前

I assigned address of string s to some ptr(p1) and try to print string through that ptr and whole string is printed.我将字符串 s 的地址分配给某个 ptr(p1) 并尝试通过该 ptr 打印字符串并打印整个字符串。

after tokenized标记化后

strtok return the address of string s to ptr(p2) but when I try to print string through ptr it only print "30" it did not print whole string. strtok 将字符串 s 的地址返回到 ptr(p2) ,但是当我尝试通过 ptr 打印字符串时,它只打印“30”,但没有打印整个字符串。 so it's sure that strtok is not just returning adress but it is placing '\\0' character where delimiter is present .所以可以肯定strtok is not just returning adress but it is placing '\\0' character where delimiter is present

cross check交叉检查

1. 1.

again I assign the address of string s to some ptr (p3) and try to print string it prints "30" as while tokenizing the string is updated with '\\0' at delimiter.我再次将字符串 s 的地址分配给某个 ptr (p3) 并尝试打印它打印“30”的字符串,因为在对字符串进行标记时,在分隔符处使用 '\\0' 进行更新。

2. 2.

see printing string s character by character via loop the 1st delimiter is replaced by '\\0' so it is printing blank space rather than ''请参阅通过循环逐个字符地打印字符串,第一个分隔符被 '\\0' 替换,因此它打印的是空格而不是 ''

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM