简体   繁体   English

使用 strtok 在 c 中拆分字符串

[英]Using strtok to split a string in c

In the code below I try to split a string to get names separated by "," from the string.在下面的代码中,我尝试拆分字符串以从字符串中获取由“,”分隔的名称。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int main(void)
{
    char *st = "Addison,Jayden,Sofia,Michael,Andrew,Lily,Benjamin";
    char *name[7];
    char *separate = ",";

    char *token = strtok(st, separate);
    int i = 0;
    while(token != NULL)
    {
        strcpy(name[i], token);
        token = strtok(NULL, ",");
        i++;
    }
    for (int j = 0; j < 7; j++)
    {
        printf("%s\n", name[j]);
    }
}

However, I run into segmentation fault when I try to run the code.但是,当我尝试运行代码时遇到了分段错误。 I try to debug it and it seems that this specific line of code is where the error comes from:我尝试调试它,似乎这行特定的代码是错误的来源:

char *token = strtok(st, separate);

Can anyone please tell me what did I do wrong?谁能告诉我我做错了什么?

You declared a pointer to a string literal您声明了一个指向字符串文字的指针

char *st = "Addison,Jayden,Sofia,Michael,Andrew,Lily,Benjamin";

You may not change string literals.您不能更改字符串文字。 Any attempt to changve a string literal results in undefined behavior.任何改变字符串文字的尝试都会导致未定义的行为。

From the C Standard (6.4.5 String literals)来自 C 标准(6.4.5 字符串文字)

7 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. 7 如果这些数组的元素具有适当的值,则未指定这些数组是否不同。 If the program attempts to modify such an array, the behavior is undefined如果程序尝试修改这样的数组,则行为未定义

On the other hand, the standard C function strtok changes the passed to it string.另一方面,标准 C 函数strtok更改传递给它的字符串。

From the C Standard (7.23.5.8 The strtok function)来自 C 标准(7.23.5.8 strtok 函数)

4 The strtok function then searches from there for a character that is contained in the current separator string. 4 strtok 函数然后从那里搜索包含在当前分隔符字符串中的字符。 If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer.如果没有找到这样的字符,则当前标记会延伸到 s1 指向的字符串的末尾,后续对标记的搜索将返回空指针。 If such a character is found, it is overwritten by a null character, which terminates the current token.如果找到这样的字符,它会被一个空字符覆盖,终止当前标记。 The strtok function saves a pointer to the following character, from which the next search for a token will start. strtok 函数保存指向下一个字符的指针,下一次搜索标记将从该字符开始。

So substitute the declaration for所以将声明替换为

char st[] = "Addison,Jayden,Sofia,Michael,Andrew,Lily,Benjamin";

Also you declared an array of pointers that is not initialized.您还声明了一个未初始化的指针数组。

char *name[7];

So this statement所以这个说法

strcpy(name[i], token);

also invokes undefined behavior.还调用未定义的行为。

Instead you could just write相反,你可以写

name[i] = token;

And as the variable i contains the number of tfokens then in general this loop由于变量 i 包含 tfokens 的数量,因此通常这个循环

for (int j = 0; j < 7; j++)
{
    printf("%s\n", name[j]);
}

should be rewritten at least like (without using the magic number 7)应该至少像(不使用幻数 7)那样重写

for (int j = 0; j < i; j++)
{
    puts( name[j] );
}

However, I run into segmentation fault when I try to run the code但是,当我尝试运行代码时遇到了分段错误

Two problems:两个问题:

  • Attempting to change a non-editable string.试图更改不可编辑的字符串。
  • Attempt to access location not owned by process.尝试访问不属于进程的位置。

Either of these can result in a crash.其中任何一个都可能导致崩溃。 (seg-fault) (段错误)

Memory allocation problem :内存分配问题
The declaration:声明:

  char *name[7];

Creates an array of 7 pointers.创建一个包含 7 个指针的数组。 Each of them require memory to be allocated before it can be used in this way:它们中的每一个都需要分配内存才能以这种方式使用:

strcpy(name[i], token);

Memory allocation example:内存分配示例:

for(int i=0;i<7;i++)
{
    name[i] = malloc((maxNameLen+1)*sizeof(name[i]));
    if(!name[i]) 
    {
        //handle error
    }
}
//now each pointer has space for up to maxNameLen+1 characters

Non-editable string :不可编辑的字符串

char *st = "Addison,Jayden,Sofia,Michael,Andrew,Lily,Benjamin";

A string to be edited cannot be in this form (ie a string literal ).要编辑的字符串不能采用这种形式(即字符串文字)。 It must be in an editable form in order for strtok() to change it, as it will when parsing.它必须是可编辑的形式,以便strtok()更改它,就像解析时一样。 There are many string forms that are editable, Here are two examples:有许多可编辑的字符串形式,以下是两个示例:

//Create an editable form:
char st[] = {"Addison,Jayden,Sofia,Michael,Andrew,Lily,Benjamin"};

//Create an editable copy of original: (requires freeing when done using)
char *duplicate = strdup(st);
if(duplicate)
{
    //success duplicating, now parse duplicate.

There is another discussion here explaining editable/non-editable strings in C.这里还有另一个讨论解释了 C 中的可编辑/不可编辑字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM