简体   繁体   English

将字符串拆分为标记并将标记放入数组 - strtok

[英]Splitting a string into tokens and putting tokens into array - strtok

char** token_arr(char* str, int n_tokens)
{
   char**arr = malloc((n_tokens+1)*sizeof(char*));
   char str2[n_tokens + 1];
   strcpy(str2,str);
   int i = 0;
   char *p = strtok (str2, " ");

   while (p != NULL)
   {
       arr[i] = p;
       //printf("%s\n", arr[i]);
       p = strtok (NULL, " ");
       i++;
   }
 return arr;
}

The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. token_arr的目的是获取一个字符串和一些标记,然后将标记放入一个数组中。 The array of tokens is returned. 返回令牌数组。

int main(void) {
  char*str1 = "( 8 + ( 41 - 12 ) )";
  char**expression = token_arr(str1, 9);
  for(int i = 0; i < 9; i++)
    printf("expression[%d] = %c\n", i, *expression2[i]);
 return 0;
}

Output: 输出:

expression2[0] = (
expression2[1] = 
expression2[2] = 
expression2[3] = 
expression2[4] = 
expression2[5] = 
expression2[6] = 
expression2[7] = 
expression2[8] =

Why is only the first value being printed? 为什么只打印第一个值? What's wrong with my code? 我的代码出了什么问题?

While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens being found. 虽然我认为您可能已根据注释对大部分问题进行了排序,但让我们看一下解决expressions验证/返回的方法以及返回令牌数量的方法,以防止令牌化错误导致更少比找到n_tokens

As you have learned, when you declare str2 local to token_arr , it has automatic storage duration and is only valid within the scope where it is declared. 如您token_arr ,当您向token_arr声明str2本地时,它具有自动存储持续时间 ,并且仅在声明它的范围内有效。 When token_arr returns, the memory holding str2 is released for re-use and any attempt to reference that memory back in main() invokes Undefined Behavior . token_arr返回时,释放持有str2的内存以供重用,并且在main()引用该内存的任何尝试都会调用Undefined Behavior

What are your options? 你有什么选择? (1) use strdup to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i] , eg (1)使用strdup为每个令牌动态分配存储,将令牌复制到分配的新内存,然后将包含令牌的新内存块的起始地址分配给arr[i] ,例如

        arr[i] = strdup (p);

or (2) do the same thing manually using strlen, malloc & memcpy , eg 或者(2)使用strlen, malloc & memcpy手动执行相同的操作,例如

        size_t len = strlen(p);
        arr[i] = malloc (len + 1);
        /* validate - here */
        memcpy (arr[i], p, len + 1);

Now each arr[i] points to a block of memory having allocated storage duration which remains valid until free is called on that block -- or the program ends. 现在每个arr[i]指向一个已分配存储持续时间的内存块,该内存块在该块上调用free之前保持有效 - 或者程序结束。

What If Less Than n_tokens Are Found? 如果发现少于n_tokens

If less than n_tokens are found within token_arr and you attempt to use n_tokens through expressions back in main() you will likely invoke Undefined Behavior again. 如果在n_tokens中找到少于token_arr并且您尝试通过main() expressions使用n_tokens ,则可能会再次调用未定义的行为 To ensure you only use the tokens found in token_arr and made available in main() by the assignment to expression -- Pass A Pointer To n_tokens as the second parameter and update it will the value of i before you return arr; 为了确保你只使用在token_arr找到的令牌,并通过赋值给expressionmain()使用 - 将指针传递给 n_tokens作为第二个参数并在return arr;之前更新它的值i return arr; , eg ,例如

char **token_arr (const char *str, int *n_tokens)
{
    char **arr = malloc(*n_tokens * sizeof *arr);
    ...
        i++;
    }
    *n_tokens = i;  /* assign i to make tokes assigned available */

    return arr;
}

Now n_tokens back in main() contains only the number of tokens actually found and allocated for and assigned to arr[i] in token_arr . 现在,在main()中的n_tokens只包含在n_tokens中实际找到并分配给arr[i]token_arr

Validate Every Allocation 验证每个分配

It is critical that you validate every call to malloc, calloc, realloc, strdup or any other function that allocates memory for you. 验证每次调用malloc, calloc, realloc, strdup或任何其他为您分配内存的函数至关重要。 Allocation can, and does, fail. 分配可能而且确实会失败。 When it does, it let's you know by returning NULL instead of a pointer containing the beginning address for the new block of memory. 当它发生时,它让你知道返回NULL而不是包含新内存块的起始地址的指针。 Check every allocation. 检查每个分配。

Putting it altogether, you could do something like: 完全放在一起,你可以这样做:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **token_arr (const char *str, int *n_tokens)
{
    char **arr = malloc(*n_tokens * sizeof *arr);
    char str2 [strlen(str) + 1];
    int i = 0;

    if (!arr) { /* validate every allocation */
        perror ("malloc-n_tokens");
        return NULL;
    }

    strcpy (str2, str);

    char *p = strtok (str2, " ");

    while (i < *n_tokens && p != NULL) {    /* check used pointers */
        arr[i] = strdup (p);
        if (!arr[i]) {  /* strdup allocates -> you must validate */
            perror ("strdup-arr[i]");
            if (i)          /* if tokens stored, break an return */
                break;
            else {          /* if no tokes stored, free pointers */
                free (arr);
                return NULL;
            }
        }
        p = strtok (NULL, " ");
        i++;
    }
    *n_tokens = i;  /* assign i to make tokes assigned available */

    return arr;
}

int main (void) {

    char *str1 = "( 8 + ( 41 - 12 ) )";
    int n_tokens = 9;
    char **expression = token_arr (str1, &n_tokens);

    if (expression) {       /* validate token_arr succeeded */
        for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
            printf ("expression[%d] = %s\n", i, expression[i]);
            free (expression[i]);   /* free mem allocated by strdup */
        }
        free (expression);
    }

    return 0;
}

( note: likewise check the return of token_arr before making use of the return) 注意:在使用返回之前同样检查token_arr的返回)

Example Use/Output 示例使用/输出

$ ./bin/token_arr
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )

Memory Use/Error Check 内存使用/错误检查

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. 在您编写的任何动态分配内存的代码中,您对分配的任何内存块都有2个职责 :(1) 始终保留指向内存块起始地址的指针,因此,(2)当它为no时可以释放它需要更久。

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated. 您必须使用内存错误检查程序,以确保您不会尝试访问内存或写入超出/超出已分配块的范围,尝试读取或基于未初始化值的条件跳转,最后,确认你释放了你分配的所有内存。

For Linux valgrind is the normal choice. 对于Linux, valgrind是正常的选择。 There are similar memory checkers for every platform. 每个平台都有类似的记忆检查器。 They are all simple to use, just run your program through it. 它们都很简单易用,只需通过它运行程序即可。

$ valgrind ./bin/token_arr
==8420== Memcheck, a memory error detector
==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8420== Command: ./bin/token_arr
==8420==
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
==8420==
==8420== HEAP SUMMARY:
==8420==     in use at exit: 0 bytes in 0 blocks
==8420==   total heap usage: 10 allocs, 10 frees, 92 bytes allocated
==8420==
==8420== All heap blocks were freed -- no leaks are possible
==8420==
==8420== For counts of detected and suppressed errors, rerun with: -v
==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors. 始终确认已释放已分配的所有内存并且没有内存错误。

Look things over and let me know if you have further questions. 仔细看看,如果您有其他问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM