简体   繁体   中英

Parsing a text file using strtok_r()

I am trying to parse a text file that's formatted like this:

XB0136;4310136;28;10
XB0136;4310136;29;C
XB0139;4310188;30;5
XB0145;4254875;31;20

As you can see there's a pattern, every line corresponds to some values that are relative to the serial number (the first value separated by ";"

I want to search for a certain serial number and take the corresponding data (a serial number can be repeated in my file, as you can see the first two are the same but the corresponding data doesn't match: I want to take both data)

My attempt was to open the file, pass everything into an array, then tokenize the array using "\n" as the first delimiter and ";" as the second delimiter.

int main()
{
    char matricola[50];  
    printf("insert serial number: \n");
    scanf("%s", matricola);
   
    FILE *fp=fopen("prova.txt","r");
    if (!fp){
        printf("file doesnt exist\n");    
        return -1;        
    }

    fseek(fp, 0, SEEK_END);
    unsigned int size=(ftell(fp));
    rewind(fp);
    if(size==-1){
        printf("file is empty\n");  
        return -1;          
    }

    if(size!=0)      // if file not empty
    {
        printf("file exists and it is %u bytes\n", size);

        char *delim = "\n", *delim2 = ";";
        char buffer[size];
        int rows = 25      // approx 
        int lines = ((size*sizeof(char)/rows)+100);   // approx
        char matrice[lines][rows];
       
        fread(buffer,sizeof(buffer),1,fp);
        fclose(fp);
        
        char *svptr1, *svptr2;
        char *token = strtok_r(buffer, delim, &svptr1);

        int k=0;
        while (token!=NULL)
        {
            strcpy(matrice[k],token);
            token = strtok_r(NULL, delim, &svptr1);
            k++;
           
        }

    }
   return 1;
}

Here I managed to have an array of arrays where every index is a line of my txt file. But from here I really don't know what to do, I tried using strtok again but I'm getting strange behaviour. I want to check every line, see if the serial number is the one I'm searching for, and if yes save the corresponding data elsewhere. Then go to the next line.

fgets can be used to read each line of the file.
Use strncmp to compare the first characters of the line to the serial number. strncmp will return 0 for a match.
Upon a match, sscanf can parse the fields from the line. The scanset %19[^;]; will scan up to 19 characters that are not a semi-colon, then scan the semi-colon.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main ( void)
{
    char matricola[50] = "";
    char line[100] = "";

    printf("insert serial number: \n");
    fgets ( matricola, sizeof matricola, stdin);
    size_t length = strcspn ( matricola, "\n");
    matricola[length] = 0; // remove newline

    FILE *fp=fopen("prova.txt","r");
    if (!fp){
        printf("file doesnt exist\n");
        return -1;
    }

    char (*matrice)[4][20] = NULL;
    size_t rows = 0;

    while ( fgets ( line, sizeof line, fp)) {
        if ( ! strncmp ( line, matricola, length)) {
            char (*temp)[4][20] = NULL;
            if ( NULL == ( temp = realloc ( matrice, sizeof *matrice * ( rows + 1)))) {
                fprintf ( stderr, "realloc problem\n");
                free ( matrice);
                return 1;
            }
            matrice = temp;
            if ( 4 == sscanf ( line, "%19[^;];%19[^;];%19[^;];%19[^\n]"
            , matrice[rows][0]
            , matrice[rows][1]
            , matrice[rows][2]
            , matrice[rows][3])) {
                ++rows;
            }
        }

    }

    for ( size_t each = 0; each < rows; ++each) {
        printf ( "%s\n", matrice[each][0]);
        printf ( "%s\n", matrice[each][1]);
        printf ( "%s\n", matrice[each][2]);
        printf ( "%s\n\n", matrice[each][3]);
    }

    free ( matrice);
    return 0;
}

https://en.cppreference.com/w/c/string/byte/strtok :

This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of strtok.

I'm pretty sure you'll be happier just reading lines using fscanf .

On another note, this is a semicolon-separated values file, and there's really really many libraries that read such reliably. Don't do this to yourself – C is really not a very good (or safe to use) language for string processing, and the built-in utilities like strtok are really not that great (they're also 50 years old!). Many people automatically switch to other languages than C when they have to do text-processing heavy tasks, just because, in all honesty, C is not that well-equipped compared to other languages for this particular set of tasks!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM