reading from multiple lines of a file - scanf in c using Linux cat

Question

I want to read in a stream-like manner from a log file in Linux command line, where the log file looks like this:

=== Start ===
I 322334bbaff, 4
I 322334bba0a, 4
 S ff233400ff, 8
I 000004bbaff, 4
L 322334bba0a, 4
=== End ===

and I have a c file to read every line of the log file and store the memory address and the size of it (eg, 322334bba0a and 4 ) in each eligible line.

// my_c.c
#include <stdio.h>
#include <unistd.h>

int main(int argc, char* argv[]) {

    if (!isatty(fileno(stdin))) {

    int long long addr; 
    int size; 
    char func;

    while(scanf("%c %llx, %d\n",&func, &addr, &size))
    {
         if(func=='I')
           {
    fprintf(stdout, "%llx ---- %d\n", addr,size);
           }
    }
  }
    return 0;
}

Since it should work as a stream, I have to use the pipe:

$ cat log 2>&1 | ./my_c

2>&1 is used since the principle process that replaces cat log is a program tracing from valgrind tools that is in stderr .

./my_c only reads the first line of the log file . I was hoping to read each line as it comes through the pipe and store the memory address and the size of the line.

I am very new to c programming and have searched a lot for a way to solve this. The current code is what I came up with so far.

Any help would really be appreciated.

Answer 1

I would recommend reading each line with getline(), then parsing it with either sscanf() or a custom parsing function.

// SPDX-License-Identifier: CC0-1.0
#define  _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <stdio.h>

int main(void)
{
    char   *linebuf = NULL;
    size_t  linemax = 0;
    ssize_t linelen;

    while (1) {
        char            type[2];
        unsigned long   addr;
        size_t          len;
        char            dummy;

        linelen = getline(&linebuf, &linemax, stdin);
        if (linelen == -1)
            break;

        if (sscanf(linebuf, "%1s %lx, %zu %c", type, &addr, &len, &dummy) == 3) {
            printf("type[0]=='%c', addr==0x%lx, len==%zu\n", type[0], addr, len);
        }
    }

    /* Optional: Discard used line buffer. Note: free(NULL) is safe. */
    free(linebuf);
    linebuf = NULL;
    linemax = 0;

    /* Check if getline() failed due to end-of-file, or due to an error. */
    if (!feof(stdin) || ferror(stdin)) {
        fprintf(stderr, "Error reading from standard input.\n");
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

Above, linebuf is a dynamically allocated buffer, and linemax the amount of memory allocated for it. getline() has no line length limitations, other than available memory.

Because %1s (one-character token) is used to parse the first identifier letter, any whitespace before it is ignored. (All conversions except %c and %n silently skip leading whitespace.)

The %lx converts the next token as a hexadecimal number into an unsigned long (which is exactly the same as unsigned long int ).

The %zu converts the next token as a decimal nonnegative number into a size_t .

The final %c (converting to a char ) is a dummy catcher; it isn't supposed to convert anything, but if it does, it means there was extra stuff on the line. It is preceded by a space, because we intentionally want to skip whitespace after the conversion.

(A space in a scanf()/sscanf() conversion pattern means to skip any number of whitespace at that point, including none .)

The result from the scanf family of functions is the number of successful conversions. So, if the line had the expected format, we get 3 . (If there was extra stuff on the line, it will be 4 , since the dummy char converted something.)

This example program just prints out the values of type[0] , addr , and len as parsed, so you can easily replace it with whatever if (type[0] ==...) or switch (type[0]) {... } logic you need.

Since the line buffer was allocated dynamically, it is a good practice to discard it. We do need to initialize the buffer pointer to NULL and its size to 0 , so that getline() will allocate the initial buffer, but we don't necessarily need to free the buffer, since the OS will automatically free all memory used by the process. That is why I added the comment about discarding the line buffer being optional. (Fortunately, free(NULL) is safe to do and does nothing, so all we need to do is to free(linebuf) , and set linebuf to NULL and linemax to 0 , and we can even reuse the buffer. Really, we can do that even just before a getline() totally safely. So in that way, this is a very good example of how to do dynamic memory management: no line length limits!)

To remember each memory reference for some kind of processing, we really don't have to do much additional work:

// SPDX-License-Identifier: CC0-1.0
#define  _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <stdio.h>

struct memref {
    size_t          addr;
    size_t          len;
    int             type;
};

struct memref_array {
    size_t          max;    /* Number of memory references allocated */
    size_t          num;    /* Number of memory references used */
    struct memref  *ref;    /* Dynamically allocated array of memory references */
};
#define MEMREF_ARRAY_INIT  { 0, 0, NULL }

static inline int  memref_array_add(struct memref_array *mra, size_t addr, size_t len, int type)
{
    /* Make sure we have a non-NULL pointer to a memref_array structure. */
    if (!mra)
        return -1;

    /* Make sure we have room for at least one more memref structure. */
    if (mra->num >= mra->max) {
        size_t          new_max;
        struct memref  *new_ref;

        /* Growth policy.  We need new_max to be at least mra->num + 1.
           Reallocation is "slow", so we want to allocate extra entries;
           but we don't want to allocate so much we waste oodles of memory.
           There are many possible allocation strategies, and which one is "best"
           -- really, most suited for a task at hand --, varies!
           This one uses a simple "always allocate 3999 extra entries" policy. */
        new_max = mra->num + 4000;

        new_ref = realloc(mra->ref, new_max * sizeof mra->ref[0]);
        if (!new_ref) {
            /* Reallocation failed.  Old data still exists, we just didn't get
               more memory for the new data.  This function just returns -2 to
               the caller; other options would be to print an error message and
               exit()/abort() the program. */
            return -2;
        }

        mra->max = new_max;
        mra->ref = new_ref;
    }

    /* Fill in the fields, */
    mra->ref[mra->num].addr = addr;
    mra->ref[mra->num].len  = len;
    mra->ref[mra->num].type = type;
    /* and update the number of memory references in the table. */
    mra->num++;

    /* This function returns 0 for success. */
    return 0;
}

int main(void)
{
    struct memref_array  memrefs = MEMREF_ARRAY_INIT;

    char                *linebuf = NULL;
    size_t               linemax = 0;
    ssize_t              linelen;

    while (1) {
        char            type[2];
        unsigned long   addr;
        size_t          len;
        char            dummy;

        linelen = getline(&linebuf, &linemax, stdin);
        if (linelen == -1)
            break;

        if (sscanf(linebuf, "%1s %lx, %zu %c", type, &addr, &len, &dummy) == 3) {
            if (memref_array_add(&memrefs, (size_t)addr, len, type[0])) {
                fprintf(stderr, "Out of memory.\n");
                return EXIT_FAILURE;
            }
        }
    }

    /* Optional: Discard used line buffer. Note: free(NULL) is safe. */
    free(linebuf);
    linebuf = NULL;
    linemax = 0;

    /* Check if getline() failed due to end-of-file, or due to an error. */
    if (!feof(stdin) || ferror(stdin)) {
        fprintf(stderr, "Error reading from standard input.\n");
        return EXIT_FAILURE;
    }

    /* Print the number of entries stored. */
    printf("Read %zu memory references:\n", memrefs.num);
    for (size_t i = 0; i < memrefs.num; i++) {
        printf("    addr=0x%lx, len=%zu, type='%c'\n",
               (unsigned long)memrefs.ref[i].addr,
               memrefs.ref[i].len,
               memrefs.ref[i].type);
    }

    return EXIT_SUCCESS;
}

The new memref structure describes each memory reference we read, and the memref_array structure contains a dynamically allocated array of them. The num member is the number of references in the array, and the max member is the number of references we have allocated memory for.

The memref_array_add() function takes a pointer to a memref_array , and the three values to fill in. Because C passes function parameters by value – that is, changing a parameter value in a function does not change the variable in the caller, – we need to pass a pointer, so that making the changes via the pointer. the changes are visible in the caller too. This is just how C works.

In that function, we need to take care of the memory management ourselves. Because we use the MEMREF_ARRAY_INIT to initialize the memory reference array to known safe values, we can use realloc() to resize the array pointer whenever needed. (Essentially, realloc(NULL, size) works exactly the same as malloc(size) .)

In the main program, we call that function in an if clause. if (x) is the same as if (x != 0) , ie. the body is executed if x is nonzero. Because the memref_array_add() returns zero if success, and nonzero if error, if (memref_array_add(...)) means "if memref_array_add call fails, then" .

Note that we don't discard the memory reference array in the program at all. We don't need to , because the OS will free it for us. But, if the program did do further work after no longer needing the memory reference array, then it would make sense to discard it. I bet you guessed that this is just as simple as discarding the line buffer used by getline() :

static inline void memref_array_free(struct memref_array *mra)
{
    if (mra) {
        free(mra->ref);
        mra->max = 0;
        mra->num = 0;
        mra->ref = NULL;
    }
}

so that in the main program, memref_array_free(&memrefs); would suffice.

The static inline in front of the function definitions just tells the compiler to inline the functions at their call sites, and not produce linkable symbols for them. You can omit them if you like; I use them to indicate that these are helper functions usable only in this file (or compilation unit).

The reason we use memrefs.member in the main program, but mra->member in the helper functions, is that mra is a pointer to the structure, whereas memrefs is a variable of the structure type. Again, just a C quirk. (We could write (&memrefs)->member , or (*mra).member , just as well.)

This is probably much more than you wanted to read (not just OP, but you , dear reader), but I've always felt that dynamic memory management should be shown as early as possible to new C programmers, so that they grasp them with confidence, instead of seeing it as difficult/not worth it.

reading from multiple lines of a file - scanf in c using Linux cat

Question

1 answers

solution1
1 ACCPTED 2021-04-03 06:11:07

reading from multiple lines of a file - scanf in c using Linux cat

Question

1 answers

solution1 1 ACCPTED 2021-04-03 06:11:07

solution1
1 ACCPTED 2021-04-03 06:11:07