從文件的多行讀取 - 使用 Linux cat 在 c 中的 scanf

Question

我想以類似流的方式從 Linux 命令行中的日志文件中讀取，其中日志文件如下所示：

=== Start ===
I 322334bbaff, 4
I 322334bba0a, 4
 S ff233400ff, 8
I 000004bbaff, 4
L 322334bba0a, 4
=== End ===

我有一個 c 文件來讀取log file的每一行並在每個符合條件的行中存儲 memory 地址及其大小（例如322334bba0a和4 ）。

// my_c.c
#include <stdio.h>
#include <unistd.h>

int main(int argc, char* argv[]) {

    if (!isatty(fileno(stdin))) {

    int long long addr; 
    int size; 
    char func;

    while(scanf("%c %llx, %d\n",&func, &addr, &size))
    {
         if(func=='I')
           {
    fprintf(stdout, "%llx ---- %d\n", addr,size);
           }
    }
  }
    return 0;
}

由於它應該作為 stream 工作，我必須使用 pipe：

$ cat log 2>&1 | ./my_c

2>&1被使用，因為替換cat log的主要過程是從stderr中的 valgrind 工具跟蹤的程序。

./my_c只讀取log file的第一行。 我希望通過 pipe 讀取每一行並存儲 memory 地址和行的大小。

我對 c 編程非常陌生，並且已經搜索了很多解決此問題的方法。 目前的代碼是我到目前為止想出的。

任何幫助將不勝感激。

Answer 1

我建議使用 getline() 讀取每一行，然后使用 sscanf() 或自定義解析 function 對其進行解析。

// SPDX-License-Identifier: CC0-1.0
#define  _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <stdio.h>

int main(void)
{
    char   *linebuf = NULL;
    size_t  linemax = 0;
    ssize_t linelen;

    while (1) {
        char            type[2];
        unsigned long   addr;
        size_t          len;
        char            dummy;

        linelen = getline(&linebuf, &linemax, stdin);
        if (linelen == -1)
            break;

        if (sscanf(linebuf, "%1s %lx, %zu %c", type, &addr, &len, &dummy) == 3) {
            printf("type[0]=='%c', addr==0x%lx, len==%zu\n", type[0], addr, len);
        }
    }

    /* Optional: Discard used line buffer. Note: free(NULL) is safe. */
    free(linebuf);
    linebuf = NULL;
    linemax = 0;

    /* Check if getline() failed due to end-of-file, or due to an error. */
    if (!feof(stdin) || ferror(stdin)) {
        fprintf(stderr, "Error reading from standard input.\n");
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

上面， linebuf是一個動態分配的緩沖區，而linemax是為它分配的 memory 的數量。 getline()沒有行長限制，除了可用的 memory。

因為%1s （單字符標記）用於解析第一個標識符字母，所以忽略它之前的任何空格。 （除了%c和%n之外的所有轉換都會默默地跳過前導空格。）

%lx將下一個標記作為十六進制數轉換為unsigned long （與unsigned long int完全相同）。

%zu將下一個標記作為十進制非負數轉換為size_t 。

最后的%c （轉換為char ）是一個虛擬捕手； 它不應該轉換任何東西，但如果確實如此，則意味着在線上有額外的東西。 它前面有一個空格，因為我們有意在轉換后跳過空格。

（scanf()/sscanf() 轉換模式中的空格表示在該點跳過任意數量的空格，包括none 。）

scanf 系列函數的結果是成功轉換的次數。 因此，如果該行具有預期的格式，我們會得到3 。 （如果在線上有額外的東西，它將是4 ，因為 dummy char 轉換了一些東西。）

此示例程序僅打印解析后的type[0] 、 addr和len的值，因此您可以輕松地將其替換為if (type[0] ==...)或switch (type[0]) {... }你需要的邏輯。

由於行緩沖區是動態分配的，因此最好將其丟棄。 我們確實需要將緩沖區指針初始化為NULL並將其大小初始化為0 ，以便getline()將分配初始緩沖區，但我們不一定需要釋放緩沖區，因為操作系統會自動釋放所有 memory 使用的過程。 這就是為什么我添加了關於丟棄行緩沖區是可選的注釋的原因。 （幸運的是， free(NULL)是安全的，什么也不做，所以我們需要做的就是free(linebuf) ，並將linebuf設置為NULL和linemax為0 ，我們甚至可以重用緩沖區。真的，我們可以甚至在完全安全的getline()之前執行此操作。因此，這是一個很好的示例，說明如何進行動態 memory 管理：沒有行長限制！）

要記住每個 memory 參考以進行某種處理，我們真的不需要做太多額外的工作：

// SPDX-License-Identifier: CC0-1.0
#define  _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <stdio.h>

struct memref {
    size_t          addr;
    size_t          len;
    int             type;
};

struct memref_array {
    size_t          max;    /* Number of memory references allocated */
    size_t          num;    /* Number of memory references used */
    struct memref  *ref;    /* Dynamically allocated array of memory references */
};
#define MEMREF_ARRAY_INIT  { 0, 0, NULL }

static inline int  memref_array_add(struct memref_array *mra, size_t addr, size_t len, int type)
{
    /* Make sure we have a non-NULL pointer to a memref_array structure. */
    if (!mra)
        return -1;

    /* Make sure we have room for at least one more memref structure. */
    if (mra->num >= mra->max) {
        size_t          new_max;
        struct memref  *new_ref;

        /* Growth policy.  We need new_max to be at least mra->num + 1.
           Reallocation is "slow", so we want to allocate extra entries;
           but we don't want to allocate so much we waste oodles of memory.
           There are many possible allocation strategies, and which one is "best"
           -- really, most suited for a task at hand --, varies!
           This one uses a simple "always allocate 3999 extra entries" policy. */
        new_max = mra->num + 4000;

        new_ref = realloc(mra->ref, new_max * sizeof mra->ref[0]);
        if (!new_ref) {
            /* Reallocation failed.  Old data still exists, we just didn't get
               more memory for the new data.  This function just returns -2 to
               the caller; other options would be to print an error message and
               exit()/abort() the program. */
            return -2;
        }

        mra->max = new_max;
        mra->ref = new_ref;
    }

    /* Fill in the fields, */
    mra->ref[mra->num].addr = addr;
    mra->ref[mra->num].len  = len;
    mra->ref[mra->num].type = type;
    /* and update the number of memory references in the table. */
    mra->num++;

    /* This function returns 0 for success. */
    return 0;
}

int main(void)
{
    struct memref_array  memrefs = MEMREF_ARRAY_INIT;

    char                *linebuf = NULL;
    size_t               linemax = 0;
    ssize_t              linelen;

    while (1) {
        char            type[2];
        unsigned long   addr;
        size_t          len;
        char            dummy;

        linelen = getline(&linebuf, &linemax, stdin);
        if (linelen == -1)
            break;

        if (sscanf(linebuf, "%1s %lx, %zu %c", type, &addr, &len, &dummy) == 3) {
            if (memref_array_add(&memrefs, (size_t)addr, len, type[0])) {
                fprintf(stderr, "Out of memory.\n");
                return EXIT_FAILURE;
            }
        }
    }

    /* Optional: Discard used line buffer. Note: free(NULL) is safe. */
    free(linebuf);
    linebuf = NULL;
    linemax = 0;

    /* Check if getline() failed due to end-of-file, or due to an error. */
    if (!feof(stdin) || ferror(stdin)) {
        fprintf(stderr, "Error reading from standard input.\n");
        return EXIT_FAILURE;
    }

    /* Print the number of entries stored. */
    printf("Read %zu memory references:\n", memrefs.num);
    for (size_t i = 0; i < memrefs.num; i++) {
        printf("    addr=0x%lx, len=%zu, type='%c'\n",
               (unsigned long)memrefs.ref[i].addr,
               memrefs.ref[i].len,
               memrefs.ref[i].type);
    }

    return EXIT_SUCCESS;
}

新的memref結構描述了我們閱讀的每個 memory 參考， memref_array結構包含它們的動態分配數組。 num成員是數組中的引用數， max成員是我們為 memory 分配的引用數。

The memref_array_add() function takes a pointer to a memref_array , and the three values to fill in. Because C passes function parameters by value – that is, changing a parameter value in a function does not change the variable in the caller, – we need傳遞一個指針，以便通過指針進行更改。 更改在調用者中也可見。 這就是 C 的工作原理。

在那個 function 中，我們需要自己處理 memory 管理。 因為我們使用MEMREF_ARRAY_INIT將 memory 引用數組初始化為已知的安全值，所以我們可以在需要時使用realloc()來調整數組指針的大小。 （本質上， realloc(NULL, size)的工作方式與malloc(size)完全相同。）

在主程序中，我們在 if 子句中調用 function。 if (x)與if (x != 0)相同，即。 如果x不為零，則執行主體。 因為memref_array_add()如果成功則返回零，如果錯誤則返回非零， if (memref_array_add(...))表示“如果 memref_array_add 調用失敗，則” 。

請注意，我們根本不會丟棄程序中的 memory 參考數組。 我們不需要，因為操作系統會為我們釋放它。 但是，如果程序在不再需要 memory 參考數組后確實做了進一步的工作，那么丟棄它是有意義的。 我打賭你猜到這就像丟棄getline()使用的行緩沖區一樣簡單：

static inline void memref_array_free(struct memref_array *mra)
{
    if (mra) {
        free(mra->ref);
        mra->max = 0;
        mra->num = 0;
        mra->ref = NULL;
    }
}

這樣在主程序中， memref_array_free(&memrefs); 就足夠了。

function 定義前面的static inline只是告訴編譯器在其調用站點內聯函數，而不為它們生成可鏈接符號。 如果你願意，你可以省略它們； 我用它們來表示這些是只能在這個文件（或編譯單元）中使用的輔助函數。

我們在主程序中使用memrefs.member而在輔助函數中使用mra->member的原因是， mra是指向結構的指針，而memrefs是結構類型的變量。 同樣，只是一個 C 怪癖。 （我們也可以寫成(&memrefs)->member或(*mra).member 。）

這可能比你想讀的要多得多（不僅僅是 OP，還有你，親愛的讀者），但我一直覺得動態 memory 管理應該盡早展示給新的 C 程序員，以便他們掌握它們信心，而不是將其視為困難/不值得。

從文件的多行讀取 - 使用 Linux cat 在 c 中的 scanf

問題描述

1 個解決方案

解決方案1
1 已采納 2021-04-03 06:11:07

從文件的多行讀取 - 使用 Linux cat 在 c 中的 scanf

問題描述

1 個解決方案

解決方案1 1 已采納 2021-04-03 06:11:07

解決方案1
1 已采納 2021-04-03 06:11:07