I want to read in a stream-like manner from a log file in Linux command line, where the log file looks like this:
=== Start ===
I 322334bbaff, 4
I 322334bba0a, 4
S ff233400ff, 8
I 000004bbaff, 4
L 322334bba0a, 4
=== End ===
and I have a c file to read every line of the log file
and store the memory address and the size of it (eg, 322334bba0a
and 4
) in each eligible line.
// my_c.c
#include <stdio.h>
#include <unistd.h>
int main(int argc, char* argv[]) {
if (!isatty(fileno(stdin))) {
int long long addr;
int size;
char func;
while(scanf("%c %llx, %d\n",&func, &addr, &size))
{
if(func=='I')
{
fprintf(stdout, "%llx ---- %d\n", addr,size);
}
}
}
return 0;
}
Since it should work as a stream, I have to use the pipe:
$ cat log 2>&1 | ./my_c
2>&1
is used since the principle process that replaces cat log
is a program tracing from valgrind tools that is in stderr
.
./my_c
only reads the first line of the log file
. I was hoping to read each line as it comes through the pipe and store the memory address and the size of the line.
I am very new to c programming and have searched a lot for a way to solve this. The current code is what I came up with so far.
Any help would really be appreciated.
I would recommend reading each line with getline(), then parsing it with either sscanf() or a custom parsing function.
// SPDX-License-Identifier: CC0-1.0
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
char *linebuf = NULL;
size_t linemax = 0;
ssize_t linelen;
while (1) {
char type[2];
unsigned long addr;
size_t len;
char dummy;
linelen = getline(&linebuf, &linemax, stdin);
if (linelen == -1)
break;
if (sscanf(linebuf, "%1s %lx, %zu %c", type, &addr, &len, &dummy) == 3) {
printf("type[0]=='%c', addr==0x%lx, len==%zu\n", type[0], addr, len);
}
}
/* Optional: Discard used line buffer. Note: free(NULL) is safe. */
free(linebuf);
linebuf = NULL;
linemax = 0;
/* Check if getline() failed due to end-of-file, or due to an error. */
if (!feof(stdin) || ferror(stdin)) {
fprintf(stderr, "Error reading from standard input.\n");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Above, linebuf
is a dynamically allocated buffer, and linemax
the amount of memory allocated for it. getline()
has no line length limitations, other than available memory.
Because %1s
(one-character token) is used to parse the first identifier letter, any whitespace before it is ignored. (All conversions except %c
and %n
silently skip leading whitespace.)
The %lx
converts the next token as a hexadecimal number into an unsigned long
(which is exactly the same as unsigned long int
).
The %zu
converts the next token as a decimal nonnegative number into a size_t
.
The final %c
(converting to a char
) is a dummy catcher; it isn't supposed to convert anything, but if it does, it means there was extra stuff on the line. It is preceded by a space, because we intentionally want to skip whitespace after the conversion.
(A space in a scanf()/sscanf() conversion pattern means to skip any number of whitespace at that point, including none .)
The result from the scanf family of functions is the number of successful conversions. So, if the line had the expected format, we get 3
. (If there was extra stuff on the line, it will be 4
, since the dummy char converted something.)
This example program just prints out the values of type[0]
, addr
, and len
as parsed, so you can easily replace it with whatever if (type[0] ==...)
or switch (type[0]) {... }
logic you need.
Since the line buffer was allocated dynamically, it is a good practice to discard it. We do need to initialize the buffer pointer to NULL
and its size to 0
, so that getline()
will allocate the initial buffer, but we don't necessarily need to free the buffer, since the OS will automatically free all memory used by the process. That is why I added the comment about discarding the line buffer being optional. (Fortunately, free(NULL)
is safe to do and does nothing, so all we need to do is to free(linebuf)
, and set linebuf
to NULL
and linemax
to 0
, and we can even reuse the buffer. Really, we can do that even just before a getline()
totally safely. So in that way, this is a very good example of how to do dynamic memory management: no line length limits!)
To remember each memory reference for some kind of processing, we really don't have to do much additional work:
// SPDX-License-Identifier: CC0-1.0
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <stdio.h>
struct memref {
size_t addr;
size_t len;
int type;
};
struct memref_array {
size_t max; /* Number of memory references allocated */
size_t num; /* Number of memory references used */
struct memref *ref; /* Dynamically allocated array of memory references */
};
#define MEMREF_ARRAY_INIT { 0, 0, NULL }
static inline int memref_array_add(struct memref_array *mra, size_t addr, size_t len, int type)
{
/* Make sure we have a non-NULL pointer to a memref_array structure. */
if (!mra)
return -1;
/* Make sure we have room for at least one more memref structure. */
if (mra->num >= mra->max) {
size_t new_max;
struct memref *new_ref;
/* Growth policy. We need new_max to be at least mra->num + 1.
Reallocation is "slow", so we want to allocate extra entries;
but we don't want to allocate so much we waste oodles of memory.
There are many possible allocation strategies, and which one is "best"
-- really, most suited for a task at hand --, varies!
This one uses a simple "always allocate 3999 extra entries" policy. */
new_max = mra->num + 4000;
new_ref = realloc(mra->ref, new_max * sizeof mra->ref[0]);
if (!new_ref) {
/* Reallocation failed. Old data still exists, we just didn't get
more memory for the new data. This function just returns -2 to
the caller; other options would be to print an error message and
exit()/abort() the program. */
return -2;
}
mra->max = new_max;
mra->ref = new_ref;
}
/* Fill in the fields, */
mra->ref[mra->num].addr = addr;
mra->ref[mra->num].len = len;
mra->ref[mra->num].type = type;
/* and update the number of memory references in the table. */
mra->num++;
/* This function returns 0 for success. */
return 0;
}
int main(void)
{
struct memref_array memrefs = MEMREF_ARRAY_INIT;
char *linebuf = NULL;
size_t linemax = 0;
ssize_t linelen;
while (1) {
char type[2];
unsigned long addr;
size_t len;
char dummy;
linelen = getline(&linebuf, &linemax, stdin);
if (linelen == -1)
break;
if (sscanf(linebuf, "%1s %lx, %zu %c", type, &addr, &len, &dummy) == 3) {
if (memref_array_add(&memrefs, (size_t)addr, len, type[0])) {
fprintf(stderr, "Out of memory.\n");
return EXIT_FAILURE;
}
}
}
/* Optional: Discard used line buffer. Note: free(NULL) is safe. */
free(linebuf);
linebuf = NULL;
linemax = 0;
/* Check if getline() failed due to end-of-file, or due to an error. */
if (!feof(stdin) || ferror(stdin)) {
fprintf(stderr, "Error reading from standard input.\n");
return EXIT_FAILURE;
}
/* Print the number of entries stored. */
printf("Read %zu memory references:\n", memrefs.num);
for (size_t i = 0; i < memrefs.num; i++) {
printf(" addr=0x%lx, len=%zu, type='%c'\n",
(unsigned long)memrefs.ref[i].addr,
memrefs.ref[i].len,
memrefs.ref[i].type);
}
return EXIT_SUCCESS;
}
The new memref
structure describes each memory reference we read, and the memref_array
structure contains a dynamically allocated array of them. The num
member is the number of references in the array, and the max
member is the number of references we have allocated memory for.
The memref_array_add()
function takes a pointer to a memref_array
, and the three values to fill in. Because C passes function parameters by value – that is, changing a parameter value in a function does not change the variable in the caller, – we need to pass a pointer, so that making the changes via the pointer. the changes are visible in the caller too. This is just how C works.
In that function, we need to take care of the memory management ourselves. Because we use the MEMREF_ARRAY_INIT
to initialize the memory reference array to known safe values, we can use realloc()
to resize the array pointer whenever needed. (Essentially, realloc(NULL, size)
works exactly the same as malloc(size)
.)
In the main program, we call that function in an if clause. if (x)
is the same as if (x != 0)
, ie. the body is executed if x
is nonzero. Because the memref_array_add()
returns zero if success, and nonzero if error, if (memref_array_add(...))
means "if memref_array_add call fails, then" .
Note that we don't discard the memory reference array in the program at all. We don't need to , because the OS will free it for us. But, if the program did do further work after no longer needing the memory reference array, then it would make sense to discard it. I bet you guessed that this is just as simple as discarding the line buffer used by getline()
:
static inline void memref_array_free(struct memref_array *mra)
{
if (mra) {
free(mra->ref);
mra->max = 0;
mra->num = 0;
mra->ref = NULL;
}
}
so that in the main program, memref_array_free(&memrefs);
would suffice.
The static inline
in front of the function definitions just tells the compiler to inline the functions at their call sites, and not produce linkable symbols for them. You can omit them if you like; I use them to indicate that these are helper functions usable only in this file (or compilation unit).
The reason we use memrefs.member
in the main program, but mra->member
in the helper functions, is that mra
is a pointer to the structure, whereas memrefs
is a variable of the structure type. Again, just a C quirk. (We could write (&memrefs)->member
, or (*mra).member
, just as well.)
This is probably much more than you wanted to read (not just OP, but you , dear reader), but I've always felt that dynamic memory management should be shown as early as possible to new C programmers, so that they grasp them with confidence, instead of seeing it as difficult/not worth it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.