简体   繁体   English

分割字符串的功能有时会导致分段错误

[英]Function to split string sometimes gives segmentation fault

I have the following function to split a string. 我有以下功能来拆分字符串。 Most of the time it works fine, but sometimes it randomly causes a segmentation fault. 大多数情况下,它运行良好,但有时会随机导致分段错误。

char** splitString(char* string, char* delim){
    int count = 0;
    char** split = NULL;
    char* temp = strtok(string, delim);

    while(temp){
        split = realloc(split, sizeof(char*) * ++count);

        split[count - 1] = temp;
        temp = strtok(NULL, " ");
    }

    int i = 0;
    while(split[i]){
        printf("%s\n", split[i]);
        i++;
    }

    split[count - 1][strlen(split[count - 1]) - 1] = '\0';
    return split;
}
split[count - 1][strlen(split[count - 1]) - 1] = '\0';

should look like 应该看起来像

split[count - 1] = NULL;

You don't have anything allocated there so that you can access it and put '\\0'. 您没有在此分配任何内容,因此可以对其进行访问并放入“ \\ 0”。

After that put that line before while(split[i]) so that the while can stop when it reaches NULL. 之后,将该行放在while(split[i])之前while(split[i])以便while到达NULL时可以停止。

函数strtok不是可重入的,请使用strtok_r()函数,这是可重入的版本strtok()。

You have a number of subtle issues, not the least of which your function will segfault if you pass a string literal. 您有许多细微的问题,如果传递字符串文字,您的函数就会出现段错误。 You need to make a copy of the string you will be splitting as strtok modifies the string. 您需要复制要拆分的字符串,因为strtok会修改该字符串。 If you pass a string literal (stored in read-only memory), your compiler has no way of warning unless you have declared string as const char *string; 如果传递字符串文字(存储在只读存储器中),则编译器将无法发出警告,除非您将string声明为const char *string;

To avoid these problems, simply make a copy of the string you will tokeninze. 为避免这些问题,只需复制将标记的字符串的副本即可。 That way, regardless how the string you pass to the function was declared, you avoid the problem altogether. 这样,无论如何声明传递给函数的字符串,都可以完全避免问题。

You should also pass a pointer to size_t as a parameter to your function in order to make the number of token available back in the calling function. 您还应该将指向size_t的指针作为参数传递给函数,以使令牌数在调用函数中可用。 That way you do not have to leave a sentinel NULL as the final pointer in the pointer to pointer to char you return. 这样,您不必在指向返回的char指针的指针中保留定点NULL作为最终指针。 Just pass a pointer and update it to reflect the number of tokens parsed in your function. 只需传递一个指针并对其进行更新以反映您的函数中解析的令牌数。

Putting those pieces together, and cleaning things up a bit, you could use the following to do what you are attempting to do: 将这些碎片放在一起,并进行一些清理,可以使用以下方法来完成您要尝试做的事情:

char **splitstr (const char *str, char *delim, size_t *n)
{
    char *cpy = strdup (str), *p = cpy; /* copy of str & pointer */
    char **split = NULL;                /* pointer to pointer to char */
    *n = 0;                             /* zero 'n' */

    for (p = strtok (p, delim); p; p = strtok (NULL, delim)) {
        void *tmp = realloc (split, sizeof *split * (*n + 1));
        if (!tmp) { /* validate realloc succeeded */
            fprintf (stderr, "splitstr() error: memory exhausted.\n");
            break;
        }
        split = tmp;                /* assign tmp to split */
        split[(*n)++] = strdup (p); /* allocate/copy to split[n] */
    }
    free (cpy);     /* free cpy */
    return split;   /* return split */
}

Adding a short example program, you could do the following: 添加一个简短的示例程序,您可以执行以下操作:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **splitstr (const char *str, char *delim, size_t *n)
{
    char *cpy = strdup (str), *p = cpy; /* copy of str & pointer */
    char **split = NULL;                /* pointer to pointer to char */
    *n = 0;                             /* zero 'n' */

    for (p = strtok (p, delim); p; p = strtok (NULL, delim)) {
        void *tmp = realloc (split, sizeof *split * (*n + 1));
        if (!tmp) { /* validate realloc succeeded */
            fprintf (stderr, "splitstr() error: memory exhausted.\n");
            break;
        }
        split = tmp;                /* assign tmp to split */
        split[(*n)++] = strdup (p); /* allocate/copy to split[n] */
    }
    free (cpy);     /* free cpy */
    return split;   /* return split */
}

int main (void) {

    size_t n = 0;                   /* number of strings */
    char *s = "My dog has fleas.",  /* string to split */
        *delim = " .\n",            /* delims */
        **strings = splitstr (s, delim, &n);    /* split s */

    for (size_t i = 0; i < n; i++) {    /* output results */
        printf ("strings[%zu] : %s\n", i, strings[i]);
        free (strings[i]);          /* free string */
    }
    free (strings);     /* free pointers */

    return 0;
}

Example Use/Output 使用/输出示例

$ ./bin/splitstrtok
strings[0] : My
strings[1] : dog
strings[2] : has
strings[3] : fleas

Memory Use/Error Check 内存使用/错误检查

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. 在您编写的任何可以动态分配内存的代码中,对于任何分配的内存块,您都有2个责任 :(1) 始终保留指向该内存块起始地址的指针,因此,(2)在没有内存块时可以将其释放需要更长的时间。

It is imperative that you use a memory error checking program to insure you do not attempt to write beyond/outside the bounds of your allocated block of memory, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated. 必须使用一个内存错误检查程序来确保您不尝试在已分配的内存块的边界之外/之外进行写入,不要试图在未初始化的值上读取或基于条件跳转,最后确认您释放所有已分配的内存。

For Linux valgrind is the normal choice. 对于Linux, valgrind是通常的选择。 There are similar memory checkers for every platform. 每个平台都有类似的内存检查器。 They are all simple to use, just run your program through it. 它们都很容易使用,只需通过它运行程序即可。

$ valgrind ./bin/splitstrtok
==14471== Memcheck, a memory error detector
==14471== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==14471== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==14471== Command: ./bin/splitstrtok
==14471==
strings[0] : My
strings[1] : dog
strings[2] : has
strings[3] : fleas
==14471==
==14471== HEAP SUMMARY:
==14471==     in use at exit: 0 bytes in 0 blocks
==14471==   total heap usage: 9 allocs, 9 frees, 115 bytes allocated
==14471==
==14471== All heap blocks were freed -- no leaks are possible
==14471==
==14471== For counts of detected and suppressed errors, rerun with: -v
==14471== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors. 始终确认已释放已分配的所有内存,并且没有内存错误。

Look things over and let me know if you have further questions. 仔细检查一下,如果您还有其他问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM