strtok/strtok_r quit parsing in the middle

Question

strtok_r puts null character into different places of the input string while parsing. The original string is restored only after strtok_r returns NULL .

What if I need to extract a token somewhere near the beginning of a long string? If I leave the loop, the input string remains broken. I could try to restore the delimiter manually, but I don't know whether it's the last token. The problem is that saveptr value is not documented.

void extract_nth_token(char *res, size_t reslen, char *str, const char *delim, int n) {
  int i;
  char *token;
  char *save_ptr;

  token = strtok_r(str, delim, &save_ptr);
  for(i = 0; token != NULL; i++) {
    token = strtok_r(NULL, delim, &save_ptr);
    if (i == n) {
      snprintf(res, reslen, "%s", token);
      /* token[strlen(token)] = delim[0]; */
      /* break; */
    }
  }
}

Answer 1

strtok() and strtok_r() are terrible functions:

they modify the input string
They treat consecutive delimiters as one, which may be intended when skipping whitespace, but which is not intended when parsing .CSV (or tab-separated) input.

Best is to avoid strtok() and strtok_r() completely, and use strspn() and strcspn(). The below function does that. The return value is similar to that of snprintf(): the number of characters in the found token (without counting the terminating NUL byte)

if there is no token#n: '\\0' is written to the buffer and 0 is returned
if the buffer is too small for the found token plus the terminating NUL byte, '\\0' is written to the buffer and the token length is returned
is the buffer is large enough, the token + '\\0' is written to it, and strlen(token) is returned.

#include <stdio.h>
#include <string.h>

size_t extract_nth_token_ohne_strtok_r(char *res, size_t maxlen, const char *str, const char *delim, int n)
{
size_t pos, len;
int itok;

for (itok=0,pos=0; str[pos]; ) {
        len = strcspn(str+pos, delim);
        if (itok++ == n) {
                if (len < maxlen) memcpy(res, str+pos, len), res[len] = 0;
                else res[0] = 0;
                return len;
                }
        pos += len;
        if (str[pos]) pos++;
        }
res[0] = 0;
return 0;
}

int main(void)
{
char * omg = "zero one\ttwo \tfour\nfive" ;
char token[80];
size_t toklen;
int ii;

printf("\n## With a large enough buffer:\n" );
for (ii=0; ii < 7; ii++) {
        toklen = extract_nth_token_ohne_strtok_r(token, sizeof token
                , omg, " \t\n", ii);
        printf("%d: res=%zu \"%s\"\n" , ii, toklen, token );
        }

printf("\n## With 4-character buffer:\n" );
for (ii=0; ii < 7; ii++) {
        toklen = extract_nth_token_ohne_strtok_r(token, 4
                , omg, " \t\n", ii);
        printf("%d: res=%zu \"%s\"\n" , ii, toklen, token );
        }

return 0;
}

Note: if you do want to treat consecutive whitspace as one, you could replace the if (str[pos]) pos++; by:

pos += strspn(str+pos, delim);

strtok/strtok_r quit parsing in the middle

Question

1 answers

solution1
2 ACCPTED 2015-09-04 11:00:16

strtok/strtok_r quit parsing in the middle

Question

1 answers

solution1 2 ACCPTED 2015-09-04 11:00:16

solution1
2 ACCPTED 2015-09-04 11:00:16