I'm parsing 3 values in parallel which are separated with a specific separator.
token1 = strtok_s(str1, separator, &nextToken1);
token2 = strtok_s(str2, separator, &nextToken2);
token3 = strtok_s(str3, separator, &nextToken3);
while ((token1 != NULL) && (token2 != NULL) && (token3 != NULL))
{
//...
token1 = strtok_s(NULL, separator, &nextToken1);
token2 = strtok_s(NULL, separator, &nextToken2);
token3 = strtok_s(NULL, separator, &nextToken3);
}
Suppose '-' is my separator. The behaviour is that a string with no consecutive separators:
1-2-3-45
would effectively result in each of these parts:
1
2
3
45
However, a string with two consecutive separators:
1-2--3-45
will not yield a 0 length string, that one is skipped so that the result is:
1
2
3
45
and not
1
2
3
45
What workaround or strategy would be better suited to obtain all the actual parts, including the 0-length ones? I'd like to avoid re-implementing strtok_s, if possible.
Unfortunately, strtok()
ignores empty tokens. Even though you said you wish to avoid doing that, there is no other way but to parse it yourself, using for example strchr()
to find the next delimiter and then copying the token to a temporary variable for processing. This way you can handle empty tokens whichever way you please.
Yes, that's the way this function works. It's more appropriate for tasks like parsing words where multiple whitespace characters should not be treated as empty words.
I've done a lot of parsing. I would simply write my own parser here, where the code examines one character at a time. It's not that difficult and you can make it behave exactly how you need. As an example, I've posted some C++ code to parse a CSV file in my article Reading and Writing CSV Files in MFC
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.