简体   繁体   English

C读取由逗号分隔的数字文件

[英]C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present. 我正在尝试读取一个文件,该文件包含逗号分隔的数字,并将它们存储在没有逗号存在的数组中。

For example: processes.txt contains 例如:processes.txt包含

0,1,3
1,0,5
2,9,8
3,10,6

And an array called numbers should look like: 名为数字的数组应类似于:

0 1 3 1 0 5 2 9 8 3 10 6

The code I had so far is: 到目前为止,我的代码是:

FILE *fp1;
char c; //declaration of characters

fp1=fopen(argv[1],"r"); //opening the file



int list[300];


c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;

while(c!=EOF){ //iterate until end of file
    if (isdigit(c)){ //if it is digit
        sscanf(&c,"%d",&number); //changing character to number (c)
        num=(num*10)+number;

    }
    else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
        list[i]=num;
        num=0;
        i++;

    }

    c=fgetc(fp1);

}

But this is having problems if it is a double digit. 但这是两位数的问题。 Does anyone have a better solution? 有谁有更好的解决方案? Thank you! 谢谢!

For the data shown with no space before the commas, you could simply use: 对于逗号前没有空格显示的数据,您可以简单地使用:

while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
    list[i++] = num;

This will read the comma after the number if there is one, silently ignoring when there isn't one. 如果有数字,它将在数字后读取逗号,而在没有数字的情况下,它会静默忽略。 If there might be white space before the commas in the data, add a blank before the comma in the format string. 如果数据中逗号前可能有空格,请在格式字符串中的逗号前添加一个空格。 The test on i prevents you writing outside the bounds of the list array. i上进行的测试可防止您在list数组的边界之外书写。 The ++ operator comes into its own here. ++运算符在这里使用。

First, fgetc returns an int , so c needs to be an int . 首先, fgetc返回一个int ,因此c必须是一个int

Other than that, I would use a slightly different approach. 除此之外,我将使用稍微不同的方法。 I admit that it is slightly overcomplicated. 我承认这有点复杂。 However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. 但是,如果您有几种需要不同操作的不同类型的字段(例如解析器),则可以使用这种方法。 For your specific problem, I recommend Johathan Leffler's answer. 对于您的特定问题,我建议Johathan Leffler的答案。

int c=fgetc(f);

while(c!=EOF && i<300) {
    if(isdigit(c)) {
        fseek(f, -1, SEEK_CUR);
        if(fscanf(f, "%d", &list[i++]) != 1) {
            // Handle error
        }
    }
    c=fgetc(f);
}

Here I don't care about commas and newlines. 在这里,我不在乎逗号和换行符。 I take ANYTHING other than a digit as a separator. 除了数字以外,我将其他任何内容都用作分隔符。 What I do is basically this: 我要做的基本上是这样的:

read next byte
if byte is digit:
     back one byte in the file
     read number, irregardless of length
else continue

The added condition i<300 is for security reasons. 出于安全原因,添加的条件i<300 If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error. 如果您真的想检查逗号和换行符(我没有得到让您觉得很重要的印象),那么可以轻松地添加else if (c == ...以处理错误。

Note that you should always check the return value for functions like sscanf , fscanf , scanf etc. Actually, you should also do that for fseek . 请注意,您应该始终检查sscanffscanfscanf等函数的返回值。实际上,您也应该对fseek这样做。 In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. 在这种情况下,它并不重要,因为由于该原因该代码不太可能失败,因此出于可读性考虑,我将其省略。 But in production code you SHOULD check it. 但是在生产代码中,您应该检查它。

My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. 我的解决方案是先读取整行,然后使用strtok_r(以逗号作为分隔符)对其进行解析。 If you want portable code you should use strtok instead. 如果您想要可移植的代码,则应改用strtok

A naive implementation of readline would be something like this: 天真的readline实现将是这样的:

static char *readline(FILE *file)
{
    char *line = malloc(sizeof(char));
    int index = 0;
    int c = fgetc(file);
    if (c == EOF) {
        free(line);
        return NULL;
    }
    while (c != EOF && c != '\n') {
        line[index++] = c;
        char *l = realloc(line, (index + 1) * sizeof(char));
        if (l == NULL) {
            free(line);
            return NULL;
        }
        line = l;
        c = fgetc(file);
    }
    line[index] = '\0';
    return line;
}

Then you just need to parse the whole line with strtok_r , so you would end with something like this: 然后,您只需要使用strtok_r来分析整行,因此您将以如下形式结束:

int main(int argc, char **argv)
{
    FILE *file = fopen(argv[1], "re");
    int list[300];
    if (file == NULL) {
        return 1;
    }
    char *line;
    int numc = 0;
    while((line = readline(file)) != NULL) {
        char *saveptr;
        // Get the first token
        char *tok = strtok_r(line, ",", &saveptr);
        // Now start parsing the whole line
        while (tok != NULL) {
            // Convert the token to a long if possible
            long num = strtol(tok, NULL, 0);
            if (errno != 0) {
                // Handle no value conversion
                // ...
                // ...
            }
            list[numc++] = (int) num;
            // Get next token
            tok = strtok_r(NULL, ",", &saveptr);
        }
        free(line);
    }
    fclose(file);
    return 0;
}

And for printing the whole list just use a for loop: 为了打印整个列表,只需使用for循环:

for (int i = 0; i < numc; i++) {
    printf("%d ", list[i]);
}
printf("\n");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM