简体   繁体   English

将字符串拆分为数组中的单词,而不使用 C 中的任何预制函数

[英]Splitting string into words in array without using any pre-made functions in C

I am trying to create a function that takes a string , splits it into words and return an array with the words in it.我正在尝试创建一个函数,该函数接受一个string ,将其拆分为words并返回一个包含wordsarray I am not allowed to use any pre-made functions other than malloc within the splitting function.我不允许在splitting函数中使用除malloc之外的任何预制函数。 Finally I have to set my function in this form char **ft_split_whitespaces(char *str) My current output looks like that:最后,我必须以这种形式设置我的函数char **ft_split_whitespaces(char *str)我当前的输出如下所示:


    d this is me
    s is me
    s me
    r

Expected output:预期输出:


    Hello
    World
    This
    Is
    Me

my full code is in the following codes:我的完整代码在以下代码中:


    #include <stdio.h>
    #include <stdlib.h>
    
    int     count_words(char *str)
    {
        int i; 
        int word;
        
        i = 0;
        word = 1;
        while(str[i]!='\0')
        {
            if(str[i]==' ' || str[i]=='\n' || str[i]=='\t' 
            || str[i]=='\f' || str[i]=='\r' || str[i]=='\v')
                word++;
            i++;
        }
        return (word);
    }
    
    char    **ft_split_whitespaces(char *str)
    {
        int index;
        int size;
        int index2;
        char **arr;
        
        index = 0;
        index2 = 0;
        size = count_words(str);
        arr = (char **)malloc(size * sizeof(char));
        if (arr == NULL)
            return ((char **)NULL);
        while (str[index])
        {
            if(str[index] == ' ')
            {
                index++;
                value++;
                index2++;
            }
            else
                *(arr+index2) = (char*) malloc(index * sizeof(char));
                *(arr+index2) = &str[index];    
            index++;
        }
        **arr = '\0';
        return (arr);
    }
    
    int main()
    {
        char a[] = "Hello World This Is Me";
        char **arr;
        int i;
        int ctr = count_words(a);
        arr = ft_split_whitespaces(a);
        
        for(i=0;i < ctr;i++)
            printf("%s\n",arr[i]);
        return 0;
    }

You have quite a few errors in your program:你的程序有不少错误:

  1. arr = (char **)malloc(size * sizeof(char)); is not right since arr is of type char** .是不对的,因为arrchar**类型。 You should use sizeof(char*) or better (sizeof(*arr)) since sizeof(char) is usually not equal to sizeof(char*) for modern systems.您应该使用sizeof(char*)或更好的(sizeof(*arr))因为sizeof(char)通常不等于现代系统的sizeof(char*)

  2. You don't have braces {} around your else statement in ft_split_whitespaces which you probably intended.在您可能想要的ft_split_whitespaces ,您的else语句周围没有大括号{} So your conditional logic breaks.所以你的条件逻辑中断了。

  3. You are allocating a new char[] for every non--whitespace character in the while loop.您正在为while循环中的每个非空白字符分配一个新的char[] You should only allocate one for every new word and then just fill in the characters in that array.您应该只为每个新单词分配一个,然后填写该数组中的字符。

  4. *(arr+index2) = &str[index]; This doesn't do what you think it does.这并不像你认为的那样。 It just points the string at *(arr+index2) to str offset by index .只是*(arr+index2)处的字符串指向str偏移的index You either need to copy each character individually or do a memcpy() (which you probably can't use in the question).您要么需要单独复制每个字符,要么执行memcpy() (您可能无法在问题中使用)。 This explains why your answer just provides offsets into the main string and not the actual tokens.这解释了为什么您的答案只提供主字符串的偏移量而不是实际的标记。

  5. **arr = '\\0'; You will lose whatever you store in the 0th index of arr .您将丢失存储在arr0th索引中的任何内容。 You need to individually append a \\0 to each string in arr .您需要将\\0单独附加到arr每个字符串。

  6. *(arr+index2) = (char*) malloc(index * sizeof(char)); You will end up allocating progressively increasing size of char arrays at because you are using index for the count of characters, which keeps on increasing.您最终将分配逐渐增加的char数组大小,因为您使用index来计算字符数,该数会不断增加。 You need to figure out the correct length of each token in the string and allocate appropriately.您需要找出字符串中每个标记的正确长度并进行适当分配。

Also why *(arr + index2) ?还有为什么*(arr + index2) Why not use the much easier to read arr[index2] ?为什么不使用更容易阅读的arr[index2]呢?


Further clarifications:进一步说明:

Consider str = "abc de"考虑str = "abc de"

You'll start with你将从

*(arr + 0) = (char*) malloc(0 * sizeof(char));
//ptr from malloc(0) shouldn't be dereferenced and is mostly pointless (no pun), probably NULL
*(arr + 0) = &str[0]; 

Here str[0] = 'a' and is a location somehwhere in memory, so on doing &str[0] , you'll store that address in *(arr + 0)这里str[0] = 'a'并且是内存中某处的位置,因此执行&str[0] ,您将将该地址存储在*(arr + 0)

Now in the next iteration, you'll have现在在下一次迭代中,您将拥有

*(arr + 0) = (char*) malloc(1 * sizeof(char)); 
*(arr + 0) = &str[1]; 

This time you replace the earlier malloc'd array at the same index2 again with a different address.这次您再次用不同的地址替换同一index2处较早的 malloc 数组。 In the next iterations *(arr + 0) = (char*) malloc(2 * sizeof(char));在接下来的迭代中*(arr + 0) = (char*) malloc(2 * sizeof(char)); . . You end up resetting the same *(arr + index2) position till you encounter a whitespace after which you do the same thing again for the next word.您最终会重置相同的*(arr + index2)位置,直到遇到空格,然后对下一个单词再次执行相同的操作。 So don't allocate arrays for every index value but only if and when required.所以不要为每个index值分配数组,而只是在需要时才分配数组。 Also, this shows that you'll keep on increasing the size passed to malloc with the increasing value of index which is what #6 indicated.此外,这表明您将继续增加传递给malloc的大小,而index值不断增加,这就是 #6 所指示的。

Coming to &str[index] .来到&str[index]

You are setting (arr + index2) ie a char* (pointer to char ) to another char* .您正在设置(arr + index2)即一个char* (指向char指针)到另一个char* In C, setting a pointer to another pointer doesn't copy the contents of the second pointer to the first, but only makes both of them point to the same memory location.在 C 中,将指针设置为另一个指针不会将第二个指针的内容复制到第一个指针,而只会使它们指向同一内存位置。 So when you set something like *(arr + 1) = &str[4] , it's just a pointer into the original string at index = 4 .因此,当您设置诸如*(arr + 1) = &str[4] ,它只是指向index = 4处原始字符串的指针。 If you try to print this *(arr + 1) you'll just get a substring from index = 4 to the end of the string, not the word you're trying to obtain.如果你尝试打印这个*(arr + 1)你只会得到一个从index = 4到字符串末尾的子串,而不是你想要得到的词。

**arr = '\\0' is just dereferencing the pointer at *arr and setting its value to \\0 . **arr = '\\0'只是取消引用*arr处的指针并将其值设置为\\0 So imagine if you had *(arr + 0) = "hello\\0" , you'll set it to "\\0ello\\0" .所以想象一下,如果你有*(arr + 0) = "hello\\0" ,你会将它设置为"\\0ello\\0" If you're ever iterating over this string, you'll never end up traversing beyond the first '\\0' character.如果你曾经遍历过这个字符串,你将永远不会遍历第一个'\\0'字符。 Hence you lose whatever *arr was earlier pointing to.因此,您丢失了*arr之前指向的任何内容。

Also, *(arr + i) and arr[i] are exactly equivalent and make for much better readability.此外, *(arr + i)arr[i]完全等效,并且具有更好的可读性。 It better conveys that arr is an array and arr[i] is dereferencing the i th element.它更好地传达了arr是一个数组,而arr[i]正在取消引用第i个元素。

Here is how I would do it:这是我将如何做到的:

#include <stdio.h>  // printf
#include <stdlib.h> // malloc

// this returns an array of pointers to strings
// that is one longer than the number of strings
// the last item in the array is always NULL
// so that the caller can tell when they get to the end
// we have to do this because we have no way
// to return the size of the finished array
char **ft_split_whitespaces(char *str)
{
    /*
     * First count the number of pieces in the string
     */

    // there will always be a NULL at the end of the array
    int size = 1;

    // if the string isn't empty there is one piece after the last space
    if (*str != '\0')
        size++;
        
    // there will be one piece for the bit before each space
    // so loop through the string looking for a space
    for (char *pointer = str; *pointer != '\0'; pointer++)
        if (*pointer == ' ')
            size++;

    /*
     * Now allocate the array of items that will be returned
     */
     
    char **array = malloc(size * sizeof(char*));
    if (array == NULL) return NULL; // ERROR: return "Something really bad happened!"

    /*
     * Then split the string into items and store them into the array
     */

    // index is where the piece will be stored
    int index=0;

    // we need two pointers:
    // - one for where we currently are
    // - and one for what we are looking for
    char *current=str, *next=str;

    // we are done if we are at the end of the string    
    while (*current != '\0')
    {
        // find a space character
        // but stop looking if we find the end of the string instead
        while (*next!='\0' && *next!=' ')
            next++;
        
        // now allocate enough space for this piece
        char *piece = malloc(next - current + 1);
        if (piece == NULL) break; // ERROR: exit the loop and return array as it is
        
        // and copy the piece into the memory
        for (int i=0; i<next-current; i++)
            piece[i] = current[i];
        
        // then terminate the string
        piece[next-current] = '\0';
        
        // store the new piece and increase the index
        array[index++] = piece;
        
        // now we are done with that piece
        // so start looking for the enxt one
        current = ++next;
    }

    // make sure the array ends with a NULL;
    array[index] = NULL;
    
    // return the new array
    return array;
}

int main()
{
    char **items = ft_split_whitespaces("Hello World This Is Me");

    if (items == NULL)
        printf("Something really bad happened!");
    else // loop through the array until we find a NULL
        for (char **pointer = items; *pointer != NULL; pointer++)
            printf("%s\n", *pointer);

    return 0;
}

Try it at https://onlinegdb.com/MhXIUQdo0https://onlinegdb.com/MhXIUQdo0尝试一下

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM