简体   繁体   中英

function to split string into an array by multiple delimiters in c

EDIT: Thanks to @R Sahu for finding the bug in my routine. For interested readers, here is the corrected code:

    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>

    int str_split(char **array, char *buf, char *sep, int max){
        char *token;
        int i = 0;
        int size = 0;
        char *bp = strdup(buf);
        while ( ( i < max -1 ) && ((token = strsep(&bp,sep))!= NULL ) ) {
            array[i++] = token;
        }
        array[i] = NULL;  // set to null
        size = i;
        return size;
    }

    main(){
        char buf[100];
        strcpy(buf,  "$GPGSA,A,3,19,28,14,18,27,22,31,39,,,,,1.7,1.0,1.3*35");
        char *array[50];
        char sep[] = "*,";
        int number =  str_split(array, buf+1, sep, 50);  // number is number of elements in array
        int i;
        for (i = 0; array[i] != NULL; i++) printf("%s\n",array[i]);
        free (array[0]);
        return 0;
    }

---------------------End Edit--------------------------

I find it amazing that there isn't a standard function to split a string into an array in C like there is in the other languages I use. Thus I needed to write one for my project. There were two requirements that made this problem a bit more difficult than most of the solutions posted to SO and online. The strings are NMEA strings which means:

1) It has multiple delimiters specifically , and *.
2) There are empty tokens that must get their own array entry and cannot be skipped.

That meant that strtok would not work and most of the examples were based on that. There are many examples that use strsep but 90% of them either do not compile or produce segmentation faults. My code runs fine but there is a feature I would like to add that I can't get to work. Here is the code (Note, I skip the first character which is always a $ and is verified with another routine):

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int str_split(char **array, char *buf, char *sep){
        char *token;
        int i = 0;
        int size = 0;
        char *bp = strdup(buf);
        while (  ((token = strsep(&bp,sep))!= NULL ) ) {
                array[i++] = token;
        }
        array[i] = NULL;  // set to null
        size = i;
        free(bp);

    return size;
}

main(){
    char buf[100];
        strcpy(buf,  "$GPGSA,A,3,19,28,14,18,27,22,31,39,,,,,1.7,1.0,1.3*35");
        char *array[50];
    char sep[] = "*,";
  int number =  str_split(array, buf+1, sep);  // number is number of elements in array
    int i;
        for (i = 0; array[i] != NULL; i++) printf("%s\n",array[i]);
return 0;
}

There is a potential problem in that if the string has more tokens than the size of the array passed to it, I get a segmentation fault. I wanted to pass a maximum value to the function and stop the parsing when I got to that value. Passing an int to the function is easy so I modified the while loop:

while ( ( i < max ) && ((token = strsep(&bp,sep))!= NULL ) ) {

It compiles but when it runs it does a memory dump and an error message about free being double called. I have narrowed it down to some interaction between strsep and free but I have no clue what to do other than just make sure array is large.

Here is my output (which is correct):

GPGSA
A
3
19
28
14
18
27
22
31
39




1.7
1.0
1.3
35

When I change the while to this

while ( (i< 5) &&  ((token = strsep(&bp,sep))!= NULL ) ) {

I get this output

*** Error in `./test': double free or corruption (out): 0x08200018 ***
    ======= Backtrace: =========
/lib/libc.so.6[0x41681f2d]
/lib/libc.so.6[0x4168cad9]
/lib/libc.so.6[0x4168d710]
./test[0x8048558]
./test[0x80485bf]
/lib/libc.so.6(__libc_start_main+0xe7)[0x41631687]
./test[0x80483f1]
======= Memory map: ========
08048000-08049000 r-xp 00000000 b3:0a 48         /home/root/test
08049000-0804a000 rw-p 00000000 b3:0a 48         /home/root/test
08200000-08221000 rw-p 00000000 00:00 0          [heap]
415e9000-41609000 r-xp 00000000 b3:08 14919      /lib/ld-2.19.so
41609000-4160a000 r--p 0001f000 b3:08 14919      /lib/ld-2.19.so
4160a000-4160b000 rw-p 00020000 b3:08 14919      /lib/ld-2.19.so
41618000-41787000 r-xp 00000000 b3:08 15236      /lib/libc-2.19.so
41787000-41788000 ---p 0016f000 b3:08 15236      /lib/libc-2.19.so
41788000-4178a000 r--p 0016f000 b3:08 15236      /lib/libc-2.19.so
4178a000-4178b000 rw-p 00171000 b3:08 15236      /lib/libc-2.19.so
4178b000-4178e000 rw-p 00000000 00:00 0
41a3f000-41a52000 r-xp 00000000 b3:08 14923      /lib/libgcc_s.so.1
41a52000-41a53000 rw-p 00013000 b3:08 14923      /lib/libgcc_s.so.1
b77b4000-b77b5000 rw-p 00000000 00:00 0
b77b8000-b77ba000 rw-p 00000000 00:00 0
b77ba000-b77bb000 r-xp 00000000 00:00 0          [vdso]
bf7e3000-bf804000 rw-p 00000000 00:00 0          [stack]
Aborted

While I was writing this it occurred to me that I just needed to test for i > max and skip the assignment to the array. That worked.

My modified code is:

    #include <stdio.h>
#include <string.h>
#include <stdlib.h>

int str_split(char **array, char *buf, char *sep, int max){
        char *token;
        int i = 0;
        int size = 0;
        char *bp = strdup(buf);
        while (  ((token = strsep(&bp,sep))!= NULL ) ) {
                if (i < max ) array[i++] = token;
        }
        i = (i > max)? max : i; 
        array[i] = NULL;  // set to null
        size = i;
        free(bp);

    return size;
}

main(){
    char buf[100];
        strcpy(buf,  "$GPGSA,A,3,19,28,14,18,27,22,31,39,,,,,1.7,1.0,1.3*35");
        char *array[50];
    char sep[] = "*,";
  int number =  str_split(array, buf+1, sep, 5);  // number is number of elements in array
    int i;
        for (i = 0; array[i] != NULL; i++) printf("%s\n",array[i]);
return 0;
}

What I am wondering is

1) What caused the memory dump? 2) Is there a better way to do this?

Because I could not find working code to do this, I thought I would leave this post both to see if there is a better way, and in the hope it might help others.

Your code is subject to undefined behavior. You are calling free(bp) before returning from str_split . However, elements of array point to that freed memory, which you are using in main .

Since any thing can happen when code is subject to undefined behavior, it does not make sense to try to find a reason for its behavior.

One way to fix this:

  1. Remove the linie

     free(bp); 

    from str_split .

  2. Free the memory in main . The first token points to the same memory location.

     if ( array[0] != NULL ) { free(array[0]); } 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM