简体   繁体   中英

C Programme to Parse Files

I have a CSV file in the format of:

name,num-one,num-two,num-three

I have a parsing script that I use (below), but I would like to edit the script to check a whole file before moving on.

Pseudocode of Script:

Read through the whole file/
Find a line where token 1 matches a set value AND
Token two matches another set value THEN
Set the value two tokens as new variables
Otherwise move onto the next line.

If token one ( name ) and token two ( num-one ) are equal to the values currently being processed by my programme, then set token three and four as value1 and value2 .

char    line[32];
int     count;
FILE    *read_file;

read_file = fopen ("/location/of/file.csv", "r");

fgets (line,32,read_file);

pch = strtok (line,",");

while (pch != NULL  )
{
    if (count == 1)
    {
        if ( (strcmp(pch,name) == 0) )
        {
            count++;
        }
    }
    else if (count == 2)
    {
        if ( (strcmp(pch,num-one) == 0) )
        {
            count++;
        }
    }
    else if (count == 3)
    {
        value1 = atoi(pch);
        count++;
    }
    else if (count == 4)
    {
        value2 = atoi(pch);
        count = 1;
    }
    pch = strtok (NULL, ",");
}

You really shouldn't use strtok() for something like this.

Instead, do the far simpler:

  1. Read a line using fgets() . You already do this.
  2. Parse the line using sscanf() .

With sscanf() , parsing out the four fields is a single function call:

char name[16];
int num1, num2, num3;

if(sscanf(line, "%15s,%d,%d,%d", name, &num1, &num2, &num3) == 4)
{
  printf("got '%s' with values %d, %d and %d\n", name, num1, num2, num3);
}

I'm not 100% sure that the fields you expect is correct, I found your description (and code) slighly hard to understand. I assumed a single string followed by four integers.

Note that the above treats the string part as a simple string; it cannot have embedded whitespace. To instead rely on the comma to separate fields, use:

if(sscanf(line, "%15[^,],%d,%d,%d", name, &num1, &num2, &num3) == 4)
                     ^
                     |
                changed this

This will treat the first part as a string of non-comma characters, allowing embedded spaces.

Why don't you consider writing your checker (if I understand the purpose of your post) in awk? awk will loop through your .csv, if you use gawk -vFS=',' will have the fields separated by comma, and you can very easily add any sort of test between your fields or rearranging them. Then whatever your goal is you can assume that once the awk program ran the file fulfills all your preconditions.

#!/usr/bin/gawk -f
BEGIN { OFS=FS=',';}

{
    print $1+$3, $2, $3, $4;
}

This code will copy the input file and at the same time replaces the first number by the sum of the first and the third. The awk program will be written much faster than the C, will be easier to maintain and modify and I bet it will run faster than your unpolished C version.

Your question has already been answered. Nonetheless, I'd like to show how you can use strtok for your purpose. Your count approach doesn't work. Instead you can analyse the fields one at a time and continue the loop if a condition is not met.

(You seem to be missing the outer loop in your example. I don't know whether that is what you mean by "checking the whole file". This loop reads the file line by line; continue means read the next line and break means stop reading.)

for (;;) {
    char line[32];
    char *pch;
    int x;

    if (fgets(line, 32, read_file) == NULL) break;

    pch = strtok(line, ",");
    if ((strcmp(pch, name) != 0)) continue;

    pch = strtok(NULL, ",");
    if (sscanf(pch, "%d", &x) < 1 || x != num) continue;

    pch = strtok(NULL, ",");
    if (sscanf(pch, "%d", &value1) < 1) continue;

    pch = strtok(NULL, ",");
    if (sscanf(pch, "%d", &value2) < 1) continue;
}

But I have to admit that the sscanf approach is simpler and probably enough for your purposes. If you have more complicated input, for example if the meaning and data type of the fields depends on the first field, the strtok approach might be more flexible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM