简体   繁体   English

C - 如何在嵌套的 strtok_s 中获取外部令牌的字符串

[英]C - How to get the string of the outer token in a nested strtok_s

text block:文本块:

00000001,otherPerson,0134333334,anotherDepartment
00000002,anotherPerson,01287665478,newDepartment
00000003,someoneElse,0139487632,otherDepartment
00000004,wholeNewPerson,01786666317,aDeparment
00000005,aPerson,013293842,otherDepartment
00000006,oldPerson,0133937333,anotherDepartment

I am trying to process a block of text data by checking if a column in a row is equal to a value and then get the complete row.我试图通过检查一行中的列是否等于一个值来处理一个文本数据块,然后获取完整的行。 I split the block of text into rows by \n and then split the row into column by , .我将文本块按\n拆分为行,然后按,将行拆分为列。 But in the inner iteration of the text splitting, the outer token is no more complete row.但是在文本拆分的内部迭代中,外部标记不再是完整的行。 How to keep the token complete?如何保持令牌完整?

char *sav1 = NULL;
char *token = strtok_s(copyOfRecords, "\n", &sav1);
int counter = 0;

while (token != NULL) {
    char *sav2 = NULL;
    char *innerToken = strtok_s(token, ",", &sav2);
    int counter = 1;

    while (innerToken != NULL) {
        // the variable, "token" is not complete anymore in this block
        // How to keep the outer token complete?

        innerToken = strtok_s(NULL, ",", &sav2);
    }

    token = strtok_s(NULL, "\n", &sav1);
}

The strtok family of functions modify the string they seem to extract tokens from. strtok系列函数修改它们似乎从中提取标记的字符串。 This is confusing and often counter-productive as you experience.正如您所经历的那样,这令人困惑并且经常适得其反。

Further confusion comes from the semantics of this tokenisation process, also often misunderstood: for example strtok(token, ",") will interpret any number of consecutive commas as a single separator, which means it cannot handle empty comma separated fields.进一步的混淆来自这个标记化过程的语义,也经常被误解:例如strtok(token, ",")会将任意数量的连续逗号解释为单个分隔符,这意味着它不能处理空的逗号分隔字段。 strtok_s() , which is a Microsoft extension not always available on non Microsoft systems, behaves the same way. strtok_s()是一个在非 Microsoft 系统上并不总是可用的 Microsoft 扩展,其行为方式相同。 Consider not using these functions at all.考虑根本不使用这些功能。

You should instead use strcspn() to skip lines and columns and reach your target cell, test its size and contents and return a pointer to the row if there is a match or NULL if there is no match.您应该使用strcspn()跳过行和列并到达目标单元格,测试其大小和内容,如果匹配则返回指向行的指针,如果不匹配则返回NULL This way you can restart a search from the last match.这样,您可以从最后一个匹配项重新开始搜索。

#include <stdio.h>
#include <string.h>

char *select_row(const char *data, int col, const char *value) {
    size_t value_len = strlen(value);
    while (*data != '\0') {
        const char *p = data;
        size_t row_len = strcspn(p, "\n"); // count the number of characters different than newline
        size_t data_len;
        for (int i = 0; i < col; i++) {
            p += strcspn(p, ",\n");  // skip the cell contents
            if (*p == ',') {
                p++;  // skip the comma to point to the next cell
            }
        }
        // p points to the column data
        size_t cell_len = strcspn(p, ",\n");  // compute the cell contents' length
        if (cell_len == len && memcmp(p, value, len) == 0) {
            // if there is a match, return a pointer to the beginning of the row.
            // beware that this is not a token as the data was not modified so
            // the row data stops at the newline but the string goes to the end of
            // the database.
            // you can return an allocated copy of the row with
            // return strndup(data, data_len);
            return (char *)data;
        }
        data += data_len;  // skip the row contents
        if (*data == '\n') {
            data++;   // skip the newline to point to the next row.
        }
    }
    return NULL;
}

int main() {
    const char *data = "00000001,otherPerson,0134333334,anotherDepartment\n"
                       "00000002,anotherPerson,01287665478,newDepartment\n"
                       "00000003,someoneElse,0139487632,otherDepartment\n"
                       "00000004,wholeNewPerson,01786666317,aDeparment\n"
                       "00000005,aPerson,013293842,otherDepartment\n"
                       "00000006,oldPerson,0133937333,anotherDepartment\n";

    const char *found = select_row(data, 1, "anotherPerson");
    int length = strcspn(found, "\n");
    if (found) {
        printf("%.*s\n", length, data);
    }
    return 0;
}

The simplest way to handle this is to make a copy of the outer token.处理此问题的最简单方法是制作外部令牌的副本。

while (token != NULL) {
    char *token_orig = _strdup(token);
    char *innerToken = strtok_s(token, ",", &sav2);
    int counter = 1;

    while (innerToken != NULL) {
        // Use token_orig

        innerToken = strtok_s(NULL, ",", &sav2);
    }

    token = strtok_s(NULL, "\n", &sav1);
    free(token_orig);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM