简体   繁体   English

将char数组拆分为2D数组的程序出现段错误

[英]Seg fault on a program spliting a char array to a 2D array

I wrote a program to split a char array to a 2D array as the comments below the definition of the function states. 我编写了一个程序,将char数组拆分为2D数组,作为功能状态定义下方的注释。 However, I am receiving segmentation fault on this piece of code. 但是,我在这段代码上收到分段错误。 Can some one help to find why? 有人可以帮忙找到原因吗?

The my_strlen(str) functions works the same way as the original strlen(str) function, and it works perfectly. my_strlen(str)函数的工作方式与原始strlen(str)函数相同,并且效果很好。 And the length of the char array is limited, so I do not really worry about the efficiency in memory allocation. 而且char数组的长度是有限的,因此我并不真正担心内存分配的效率。

char **my_str2vect(char *str) {
    // Takes a string 
    // Allocates a new vector (array of string ended by a NULL), 
    // Splits apart the input string x at each space character 
    // Returns the newly allocated array of strings
    // Any number of ' ','\t', and '\n's can separate words.
    // I.e. "hello \t\t\n class,\nhow are you?" -> {"hello", "class,", "how", "are","you?", NULL}
    int str_len = my_strlen(str);
    char **output = malloc(sizeof(char *) * str_len); // Allocate a 2D array first.
    if (**output) {
        for (int a = 0; a < str_len; ++a) {
            output[a] = malloc(sizeof(char) * str_len);
        }
    } else {
        return NULL;
    }
    int i = 0;
    int j = 0;
    while (i < str_len) { // Put the characters into the 2D array.
        int k = 0;
        while ((str[i] == ' ' || str[i] == '\t' || str[i] == '\n') && i < str_len) {
            ++i;
        }
        while ((!(str[i] == ' ' || str[i] == '\t' || str[i] == '\n')) && i < str_len) {
            output[j][k] = str[i];
            ++i;
            ++k;
        }
        output[j][k] = '\0';
        ++j;
    }
    output[j] = NULL;
    return output;
}

To correct your code change if (**output) to if (output) . 要更正您的代码, if (**output)更改为if (output)

I think your implementation is not memory efficient and could be more elegant. 我认为您的实现内存效率不高,可能会更优雅。
You are allocating too much memory. 您正在分配过多的内存。 I tried to explain in the code the upper bound for the size of output char pointers. 我试图在代码中解释输出char指针大小的上限。 If you want to have the exact size, you'll have to count words in the string. 如果您想要精确的大小,则必须计算字符串中的单词数。 It's probably better to do it that way, but for the exercise I think we can go the easier way. 这样做可能更好,但是对于练习,我认为我们可以走更简单的方法。

As to your code I can only say: 至于您的代码,我只能说:

  • I did not see the end of string character '\\0' anywhere, which is a bad sign 我在任何地方都没有看到字符串字符'\\0'的结尾,这是一个不好的信号
  • I did not see any string copy, which is also a bad sign 我没有看到任何字符串副本,这也是一个不好的信号
  • I did not see you make use of standard library, which often makes you reinvent the wheel 我没看到您使用标准库,这通常会让您重新发明轮子

Please see below an improved implementation (I use the standard C89): 请在下面查看改进的实现(我使用标准C89):

#include<stdio.h>
#include <string.h>
#include<stdlib.h>

char** my_str2vect(char* s) {
    // Takes a string 
    // Allocates a new vector (array of string ended by a NULL), 
    // Splits apart the input string x at each space character 
    // Returns the newly allocated array of strings
    // Any number of ' ','\t', and '\n's can separate words.
    // I.e. "hello \t\t\n class,\nhow are you?" -> {"hello", "class,", "how", "are","you?", NULL}

    int s_size = strlen(s);
    /*
     * size of output is 1 if string contains non delimiters only
     * size of output is 0 if string contains delimiters only
     * size of output is strlen / 2 if string contains ...
     * ...alternation of delimiter and non delimiter, and that is the max size
     * so we allocate that size (upper bound)
     */
    int max_output_size = (s_size / 2) + 1;
    char **output = (char **) malloc(sizeof (char *) * max_output_size);

    //initialize to NULL for convenience
    int k;
    for (k = 0; k < max_output_size; k++)
        output[k] = NULL;

    //work on a copy of s
    char *str = (char *) malloc(s_size + 1);
    strcpy(str, s);

    //pointer for token and delimiters
    char *ptr;
    char delimiter[] = "\n\t ";

    //initialize and create first token
    ptr = strtok(str, delimiter);

    //
    int i = 0;
    while (ptr != NULL) {
        //allocate memory and copy token
        output[i] = malloc(sizeof (char) * strlen(ptr) + 1);
        strcpy(output[i], ptr);
        //get next token
        ptr = strtok(NULL, delimiter);
        //increment
        i++;
    }

    return output;
}

int main(int argc, char *argv[]) {

    char **result = my_str2vect("hello \t\t\n class,\nhow are you?");

    int i;
    for (i = 0; result[i] != NULL; i++)
        printf("%s\n", result[i]);

    return 0;
}

I have tried to use gdb to determine the problem. 我试图使用gdb来确定问题。 在此处输入图片说明 It is about **output control. 关于**output控制。 You should check address of *output instead of where pointer to pointer to. 您应该检查*output地址,而不是指向指针的位置。 You are allocating places in for loop until length of the string. 您正在for循环中分配位置,直到字符串的长度为止。 It may cause defragmentation. 可能会导致碎片整理。 Moreover, the 1D char array should be passed by const to be not changeable . 此外,一维char数组应通过const传递, 以不可更改 Instead, you should use the snippet 相反,您应该使用代码段

// allocation (in the function)
// protoype: char** my_str2vect(char const* str)
int a;
char** output = malloc(str_len * sizeof(char *));
    output[0] = malloc(str_len * str_len * sizeof(char));
    for(a = 1; a < str_len; a++)
        output[a] = output[0] + a * str_len;

// freeing (in main())  
char ** x;
char const* str = "hello \t\t\n class,\nhow are you?";
x = my_str2vect(str);

free((void *)x[0]);
free((void *)x);

En passant, the source aids to get more knowledge about allocation. 总而言之, 该资源有助于获得有关分配的更多知识。

As the debugger is telling you the if (**output) is broken. 正如调试器告诉您的if (**output)是否损坏。 It's trying to dereference the pointer in the first output array location. 它试图取消对第一个输出数组位置中的指针的引用。 That's junk at the point of the if . ifif那真是垃圾。 Hence, the seg fault. 因此,段故障。 You want if (output) . 您想要if (output) When I fix this and use strlen in place of your rewrite, it seems to work fine. 当我修复此问题并使用strlen代替您的重写时,它似乎工作正常。

It's considerably simpler to make one copy of the input string and use this for all the strings in the returned vector. 复制输入字符串的一个副本并将其用于返回向量中的所有字符串,这要简单得多。 You can also use strtok to find the words, but that's not thread safe. 您也可以使用strtok查找单词,但这不是线程安全的。

Here's a suggestion: 这是一个建议:

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

char **split(char *s_org) {
  size_t i;
  // Skip initial whitespace, then copy everything else.
  for (i = 0; s_org[i] && isspace(s_org[i]); ++i) /* skip */;
  char *s = strdup(s_org + i);
  size_t n_rtn = 0, size = 0;
  char **rtn = malloc(sizeof *rtn);
  for (i = 0;;) {
    if (!s[i]) {
      rtn[n_rtn] = NULL;
      return realloc(rtn, (n_rtn + 1) * sizeof *rtn);
    }
    if (n_rtn == size) {
      size = 2 * size + 1;
      rtn = realloc(rtn, size * sizeof *rtn);
    }
    rtn[n_rtn++] = s + i;
    while (s[i] && !isspace(s[i])) ++i;
    if (s[i]) {
      s[i++] = '\0';
      while (isspace(s[i])) ++i;
    }
  }
}

int main(void) {
  char **rtn = split("  hello \t\t\n class,\nhow are you?");
  for (char **p = rtn; *p; ++p)
    printf("%s\n", *p);
  // Freeing the first element frees all strings (or does nothing if none)
  free(rtn[0]);
  free(rtn);
  return 0;
}

This omits checks for NULL returns from malloc and realloc . 这省略了对mallocrealloc NULL返回的检查。 But they're easy to add. 但是它们很容易添加。

You asked about the "other problems" with your code. 您询问了代码的“其他问题”。 I've fixed some here: 我在这里固定了一些:

  • Use size_t to index arrays. 使用size_t索引数组。
  • Grow the output array as needed. 根据需要增加输出数组。 It's really not that hard... 真的不那么难...
  • Avoid unnecessary calls to malloc . 避免不必要地调用malloc
  • Avoid strlen when simple checks for the terminating NULL are easier. 当对终止NULL简单检查比较容易时,请避免strlen
  • Use the idiom FOO *p = malloc(sizeof *p); 使用成语FOO *p = malloc(sizeof *p); to allocate a FOO . 分配FOO It's less error prone than sizeof(FOO) . 它比sizeof(FOO)容易出错。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM