简体   繁体   English

C - 使用 strtok 的分段错误

[英]C - Segmentation fault using strtok

I have this code where its read multiple files and print a certain value.我有这段代码,它读取多个文件并打印某个值。 After reading files, at a certain moment my while loop stop and show a segmentation fault...读取文件后,在某个时刻我的 while 循环停止并显示分段错误...

Here is my code这是我的代码

int main () {

    const char s[2] = ",";
    const char s2[2] = ":";

    char var1[] = "fiftyTwoWeekHigh\"";
    char *fiftyhigh;
    char *fiftyhigh2;
    char *fiftyhigh_token;
    char *fiftyhigh2_token;
   
    char var2[] = "fiftyTwoWeekLow\"";
    char *fiftylow;
    char *fiftylow2;
    char *fiftylow_token;
    char *fiftylow2_token;

    char var3[] = "regularMarketPrice\"";
    char *price;
    char *price2;
    char *price_token;
    char *price2_token;
   
    FILE *fp;
    char* data = "./data/";
    char* json = ".json";
    char line[MAX_LINES];
    char line2[MAX_LINES];
    int len;
    char* fichier = "./data/indices.txt";

    fp = fopen(fichier, "r");

    if (fp == NULL){
        printf("Impossible d'ouvrir le fichier %s", fichier);
        return 1;
    }

    while (fgets(line, sizeof(line), fp) != NULL) {
        char fname[10000];
        len = strlen(line);
        if (line[len-1] == '\n') {
            line[len-1] = 0;
        }
        
        int ret = snprintf(fname, sizeof(fname), "%s%s%s", data, line, json);
        if (ret < 0) {
            abort();
        }
        printf("%s\n", fname);
        
        FILE* f = fopen(fname, "r");

        while ( fgets( line2, MAX_LINES, f ) != NULL ) {
            fiftyhigh = strstr(line2, var1);
            fiftyhigh_token = strtok(fiftyhigh, s);
            fiftyhigh2 = strstr(fiftyhigh_token, s2);
            fiftyhigh2_token = strtok(fiftyhigh2, s2);
            printf("%s\n", fiftyhigh2_token);

            fiftylow = strstr(line2, var2);
            fiftylow_token = strtok(fiftylow, s);
            fiftylow2 = strstr(fiftylow_token, s2);
            fiftylow2_token = strtok(fiftylow2, s2);
            printf("%s\n", fiftylow2_token);

            price = strstr(line2, var3);
            price_token = strtok(price, s);
            price2 = strstr(price_token, s2);
            price2_token = strtok(price2, s2);
            printf("%s\n", price2_token);
        
            //printf("\n%s\t%s\t%s\t%s\t%s", line, calculcx(fiftyhigh2_token, price2_token, fiftylow2_token), "DIV-1", price2_token, "test");
            
        }
        fclose(f);
    }
    fclose(fp);
    return 0;
}

and the output is: output 是:

./data/k.json
13.59
5.31
8.7
./data/BCE.json
60.14
46.03
56.74
./data/BNS.json
80.16
46.38
78.73
./data/BLU.json
16.68
2.7
Segmentation fault

It is like my program stop because it can't reach a certain data at a certain file... Is there a way to allocate more memory?这就像我的程序停止,因为它无法到达某个文件的某个数据......有没有办法分配更多的 memory? Because my MAX_LINES is already set at 6000.因为我的 MAX_LINES 已经设置为 6000。

  • Did you mean '\0'?你是说'\0'吗?
if (line[len-1] == '\n') {
  line[len-1] = 0;
}

I advise you to use gdb to see where the segfault occurs and why.我建议您使用gdb来查看段错误发生的位置以及原因。 I don't think you have to allocate much more memory.我认为您不必分配更多的 memory。 But the segfault may happens because you don't have anymore data and you still print the result.但是可能会发生段错误,因为您没有更多数据并且您仍然打印结果。

Use if(price2_token,=NULL) printf("%s\n"; price2_token);使用if(price2_token,=NULL) printf("%s\n"; price2_token); for example.例如。

I'm assuming that the lines in your file look something like this:我假设您文件中的行看起来像这样:

{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }

In other words it's some kind of JSON format.换句话说,它是某种 JSON 格式。 I'm assuming that the line starts with '{' so each line is a JSON object.我假设该行以“{”开头,所以每一行都是 JSON object。

You read that line into line2 , which now contains:您将该行读入line2 ,其中现在包含:

{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }\0

Note the \0 at the end that terminates the string.请注意结尾处的\0终止字符串。 Note also that "fiftyTwoWeekLow" comes first, which turns out to be really important.另请注意,“fiftyTwoWeekLow”首先出现,事实证明这非常重要。

Now let's trace through the code here:现在让我们跟踪这里的代码:

fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);

First you call strstr to find the position of "fiftyTwoWeekHigh".首先你调用strstr来找到“fiftyTwoWeekHigh”的 position。 This will return a pointer to the position of that field name in the line.这将返回指向该行中该字段名称的 position 的指针。 Then you call strtok to find the comma that separates this value from the next.然后调用strtok来查找将这个值与下一个值分开的逗号。 I think that this is where things start to go wrong.我认为这就是 go 错误的开始。 After the call to strtok , line2 looks like this:调用strtok后, line2如下所示:

{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100\0 ... }\0

Note that strtok has modified the string: the comma has been replaced with \0 .请注意, strtok已修改字符串:逗号已替换为\0 That's so you can use the returned pointer fiftyhigh_token as a string without seeing all the stuff that came after the comma.这样您就可以将返回的指针fiftyhigh_token用作字符串,而不会看到逗号后面的所有内容。

fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);

Next you look for the colon and then call strtok with a pointer to the colon.接下来查找冒号,然后使用指向冒号的指针调用strtok Since the delimiter you're passing to strok is the colon, strtok ignores the colon and returns the next token, which (because the string we're looking at, which ends after "100," has no more colons) is the rest of the string, in other words, the number.由于您传递给strok的分隔符是冒号,因此strtok忽略冒号并返回下一个标记(因为我们正在查看的字符串以“100”结尾,没有更多的冒号)是 rest字符串,换句话说,数字。

So you've gotten your number, but probably not in the way you expected?所以你得到了你的号码,但可能不是你预期的方式? There was really no point in the second call to strtok since (assuming the JSON was well-formed) the position of "100" was just fiftyhigh2+1 .strtok的第二次调用确实没有意义,因为(假设 JSON 格式正确)“100”的 position 只是fiftyhigh2+1

Now we try to find "fiftyTwoWeekLow:"现在我们尝试找到“fiftyTwoWeekLow:”

fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);

This is basically the same process, and after you call strtok , line2 like this:这基本上是相同的过程,在你调用strtok之后, line2是这样的:

{"fiftyTwoWeekLow":32\0"fiftyTwoWeekHigh":100\0 ... }\0

Note that you're only able to find "fiftyTwoWeekLow" because it comes before "fiftyTwoWeekHigh" in the line.请注意,您只能找到“fiftyTwoWeekLow”,因为它位于该行中的“fiftyTwoWeekHigh”之前。 If it had come after, then you'd have been unable to find it due to the \0 added after "fiftyTwoWeekHigh" earlier.如果它是在之后出现的,那么由于之前在“fiftyTwoWeekHigh”之后添加的\0 ,您将无法找到它。 In that case, strstr would have returned NULL, which would cause strtok to return NULL, and then you'd definitely have gotten a seg fault after passing NULL to strstr .在这种情况下, strstr将返回 NULL,这将导致strtok返回 NULL,然后在将 Z6C3E226B4D4795D518AB341B0824 传递给strstr之后,您肯定会遇到 seg 错误

So the code is really sensitive to the order in which the fields appear in the line, and it's probably failing because some of your lines have the fields in a different order.因此,代码对字段在行中出现的顺序非常敏感,并且它可能会失败,因为您的某些行的字段顺序不同。 Or maybe some fields are just missing from some lines, which would have the same effect.或者,某些行可能只是缺少某些字段,这将产生相同的效果。

If you're parsing JSON, you should really use a library designed for that purpose.如果您正在解析 JSON,您应该真正使用为此目的设计的库。 But if you really want to use strtok then you should:但是如果你真的想使用strtok那么你应该:

  1. Read line2 .阅读line2
  2. Call strtok(line2, ",") once, then repeatedly call strtok(NULL, ",") in a loop until it returns null.调用strtok(line2, ",")一次,然后在循环中重复调用strtok(NULL, ",")直到它返回 null。 This will break up the line into tokens that each look like "someField":100 .这会将行分解为每个看起来像"someField":100的标记。
  3. Isolate the field name and value from each of these tokens (just call strchr(token, ':') to find the value).从这些标记中的每一个中隔离字段名称和值(只需调用strchr(token, ':')来查找值)。 Do not call strtok here, because it will change the internal state of strtok and you won't be able to use strtok(NULL, ",") to continue processing the line.这里不要调用strtok ,因为它会改变strtok的内部 state 并且您将无法使用strtok(NULL, ",")继续处理该行。
  4. Test the field name, and depending on its value, set an appropriate variable.测试字段名称,并根据其值设置适当的变量。 In other words, if it's the "fiftyTwoWeekLow" field, set a variable called fiftyTwoWeekLow.换句话说,如果是“fiftyTwoWeekLow”字段,则设置一个名为fiftyTwoWeekLow 的变量。 You don't have to bother to strip off the quotes, just include them in the string you're comparing with.您不必费心去除引号,只需将它们包含在您要比较的字符串中。
  5. Once you've processed all the tokens ( strtok returns NULL), do something with the variables you set.处理完所有标记( strtok返回 NULL)后,对您设置的变量执行一些操作。

You may be to pass ",{}" as the delimiter to strtok in order to get rid of any open and close curly braces that surround the line.您可能需要将",{}"作为分隔符传递给strtok ,以消除围绕该行的任何打开和关闭的花括号。 Or you could look for them in each token and ignore them if they appear.或者您可以在每个令牌中查找它们,如果它们出现则忽略它们。

You could also pass "\"{},:" as the delimiter to strtok . This would cause strtok to emit an alternating sequence of field names and values. You could call strtok once to get the field name, again to get the value, then test the field name and do something with the value.您还可以将"\"{},:"作为分隔符传递给strtok 。这将导致strtok发出字段名称和值的交替序列。您可以调用strtok一次以获取字段名称,再次调用以获取值,然后测试字段名称并对值做一些事情。

Using strtok is a pretty primitive way of parsing JSON, but it will will work as long as your JSON only contains simple field names and numbers and doesn't include any strings that themselves contain delimiter characters.使用strtok是解析 JSON 的一种非常原始的方法,但只要您的 JSON 仅包含简单的字段名称和数字并且不包含任何本身包含分隔符的字符串,它就可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM