简体   繁体   中英

strtok/strtok_r quit parsing in the middle

strtok_r puts null character into different places of the input string while parsing. The original string is restored only after strtok_r returns NULL .

What if I need to extract a token somewhere near the beginning of a long string? If I leave the loop, the input string remains broken. I could try to restore the delimiter manually, but I don't know whether it's the last token. The problem is that saveptr value is not documented.

void extract_nth_token(char *res, size_t reslen, char *str, const char *delim, int n) {
  int i;
  char *token;
  char *save_ptr;

  token = strtok_r(str, delim, &save_ptr);
  for(i = 0; token != NULL; i++) {
    token = strtok_r(NULL, delim, &save_ptr);
    if (i == n) {
      snprintf(res, reslen, "%s", token);
      /* token[strlen(token)] = delim[0]; */
      /* break; */
    }
  }
}

strtok() and strtok_r() are terrible functions:

  • they modify the input string
  • They treat consecutive delimiters as one, which may be intended when skipping whitespace, but which is not intended when parsing .CSV (or tab-separated) input.

Best is to avoid strtok() and strtok_r() completely, and use strspn() and strcspn(). The below function does that. The return value is similar to that of snprintf(): the number of characters in the found token (without counting the terminating NUL byte)

  • if there is no token#n: '\\0' is written to the buffer and 0 is returned
  • if the buffer is too small for the found token plus the terminating NUL byte, '\\0' is written to the buffer and the token length is returned
  • is the buffer is large enough, the token + '\\0' is written to it, and strlen(token) is returned.

#include <stdio.h>
#include <string.h>

size_t extract_nth_token_ohne_strtok_r(char *res, size_t maxlen, const char *str, const char *delim, int n)
{
size_t pos, len;
int itok;

for (itok=0,pos=0; str[pos]; ) {
        len = strcspn(str+pos, delim);
        if (itok++ == n) {
                if (len < maxlen) memcpy(res, str+pos, len), res[len] = 0;
                else res[0] = 0;
                return len;
                }
        pos += len;
        if (str[pos]) pos++;
        }
res[0] = 0;
return 0;
}

int main(void)
{
char * omg = "zero one\ttwo \tfour\nfive" ;
char token[80];
size_t toklen;
int ii;

printf("\n## With a large enough buffer:\n" );
for (ii=0; ii < 7; ii++) {
        toklen = extract_nth_token_ohne_strtok_r(token, sizeof token
                , omg, " \t\n", ii);
        printf("%d: res=%zu \"%s\"\n" , ii, toklen, token );
        }

printf("\n## With 4-character buffer:\n" );
for (ii=0; ii < 7; ii++) {
        toklen = extract_nth_token_ohne_strtok_r(token, 4
                , omg, " \t\n", ii);
        printf("%d: res=%zu \"%s\"\n" , ii, toklen, token );
        }

return 0;
}

Note: if you do want to treat consecutive whitspace as one, you could replace the if (str[pos]) pos++; by:

pos += strspn(str+pos, delim);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM