简体   繁体   中英

Why does my string_split implementation not work?

My str_split function returns (or at least I think it does) a char** - so a list of strings essentially. It takes a string parameter, a char delimiter to split the string on, and a pointer to an int to place the number of strings detected.

The way I did it, which may be highly inefficient, is to make a buffer of x length (x = length of string), then copy element of string until we reach delimiter, or '\\0' character. Then it copies the buffer to the char** , which is what we are returning (and has been malloc ed earlier, and can be freed from main() ), then clears the buffer and repeats.

Although the algorithm may be iffy, the logic is definitely sound as my debug code (the _D) shows it's being copied correctly. The part I'm stuck on is when I make a char** in main , set it equal to my function. It doesn't return null, crash the program, or throw any errors, but it doesn't quite seem to work either. I'm assuming this is what is meant be the term Undefined Behavior.

Anyhow, after a lot of thinking (I'm new to all this) I tried something else, which you will see in the code, currently commented out. When I use malloc to copy the buffer to a new string, and pass that copy to aforementioned char**, it seems to work perfectly. HOWEVER, this creates an obvious memory leak as I can't free it later... so I'm lost.

When I did some research I found this post , which follows the idea of my code almost exactly and works, meaning there isn't an inherent problem with the format (return value, parameters, etc) of my str_split function. YET his only has 1 malloc, for the char**, and works just fine.

Below is my code. I've been trying to figure this out and it's scrambling my brain, so I'd really appreciate help!! Sorry in advance for the 'i', 'b', 'c' it's a bit convoluted I know.

Edit : should mention that with the following code,

ret[c] = buffer;
printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);

it does indeed print correctly. It's only when I call the function from main that it gets weird. I'm guessing it's because it's out of scope ?

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define DEBUG

#ifdef DEBUG
    #define _D if (1)
#else
    #define _D if (0)
#endif

char **str_split(char[], char, int*);
int count_char(char[], char);

int main(void) {
    int num_strings = 0;
    char **result = str_split("Helo_World_poopy_pants", '_', &num_strings);

    if (result == NULL) {
        printf("result is NULL\n");
        return 0;
    }

    if (num_strings > 0) {
        for (int i = 0; i < num_strings; i++) {
            printf("\"%s\" \n", result[i]);
        }
    }

    free(result);

    return 0;
}

char **str_split(char string[], char delim, int *num_strings) {

    int num_delim = count_char(string, delim);
    *num_strings = num_delim + 1;

    if (*num_strings < 2) {
        return NULL;
    }

    //return value
    char **ret = malloc((*num_strings) * sizeof(char*));

    if (ret == NULL) {
        _D printf("ret is null.\n");
        return NULL;
    }

    int slen = strlen(string);
    char buffer[slen];

    /* b is the buffer index, c is the index for **ret */
    int b = 0, c = 0;
    for (int i = 0; i < slen + 1; i++) { 

        char cur = string[i];

        if (cur == delim || cur == '\0') {

            _D printf("Copying content of buffer to ret[%i]\n", c); 
            //char *tmp = malloc(sizeof(char) * slen  + 1);
            //strcpy(tmp, buffer);

            //ret[c] = tmp;
            ret[c] = buffer;
            _D printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
            //free(tmp);

            c++;
            b = 0;
            continue;
        }

        //otherwise

        _D printf("{%i} Copying char[%c] to index [%i] of buffer\n", c, cur, b);

        buffer[b] = cur;
        buffer[b+1] = '\0'; /* extend the null char */
        b++;

        _D printf("Buffer is now equal to: \"%s\"\n", buffer);
    }

    return ret;
}

int count_char(char base[], char c) {
    int count = 0;
    int i = 0;

    while (base[i] != '\0') {
        if (base[i++] == c) {
            count++;
        }
    }
    _D printf("Found %i occurence(s) of '%c'\n", count, c);
    return count;
}

The string pointers you store into the res with ret[c] = buffer; array point to an automatic array that goes out of scope when the function returns. The code subsequently has undefined behavior. You should allocate these strings with strdup() .

Note also that it might not be appropriate to return NULL when the string does not contain a separator. Why not return an array with a single string?

Here is a simpler implementation:

#include <stdlib.h>

char **str_split(const char *string, char delim, int *num_strings) {
    int i, n, from, to;
    char **res;

    for (n = 1, i = 0; string[i]; i++)
        n += (string[i] == delim);

    *num_strings = 0;
    res = malloc(sizeof(*res) * n);
    if (res == NULL)
        return NULL;

    for (i = from = to = 0;; from = to + 1) {
        for (to = from; string[to] != delim && string[to] != '\0'; to++)
            continue;
        res[i] = malloc(to - from + 1);
        if (res[i] == NULL) {
            /* allocation failure: free memory allocated so far */
            while (i > 0)
                free(res[--i]);
            free(res);
            return NULL;
        }
        memcpy(res[i], string + from, to - from);
        res[i][to - from] = '\0';
        i++;
        if (string[to] == '\0')
            break;
    }
    *num_strings = n;
    return res;
}

You are storing pointers to a buffer that exists on the stack. Using those pointers after returning from the function results in undefined behavior.

To get around this requires one of the following:

  • Allow the function to modify the input string (ie replace delimiters with null-terminator characters) and return pointers into it. The caller must be aware that this can happen. Note that supplying a string literal as you are doing here is illegal in C, so you would instead need to do:

     char my_string[] = "Helo_World_poopy_pants"; char **result = str_split(my_string, '_', &num_strings); 

    In this case, the function should also make it clear that a string literal is not acceptable input, and define its first parameter as const char* string (instead of char string[] ).

  • Allow the function to make a copy of the string and then modify the copy. You have expressed concerns about leaking this memory, but that concern is mostly to do with your program's design rather than a necessity.

    It's perfectly valid to duplicate each string individually and then clean them all up later. The main issue is that it's inconvenient, and also slightly pointless.

Let's address the second point. You have several options, but if you insist that the result be easily cleaned-up with a call to free , then try this strategy:

  1. When you allocate the pointer array, also make it large enough to hold a copy of the string:

     // Allocate storage for `num_strings` pointers, plus a copy of the original string, // then copy the string into memory immediately following the pointer storage. char **ret = malloc((*num_strings) * sizeof(char*) + strlen(string) + 1); char *buffer = (char*)&ret[*num_strings]; strcpy(buffer, string); 
  2. Now, do all your string operations on buffer . For example:

     // Extract all delimited substrings. Here, buffer will always point at the // current substring, and p will search for the delimiter. Once found, // the substring is terminated, its pointer appended to the substring array, // and then buffer is pointed at the next substring, if any. int c = 0; for(char *p = buffer; *buffer; ++p) { if (*p == delim || !*p) { char *next = p; if (*p) { *p = '\\0'; ++next; } ret[c++] = buffer; buffer = next; } } 
  3. When you need to clean up, it's just a single call to free , because everything was stored together.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM