简体   繁体   English

从C中的文本文件读取值流

[英]Reading a stream of values from text file in C

I have a text file which may contain one or up to 400 numbers. 我有一个文本文件,其中可能包含一个或最多400个数字。 Each number is separated by a comma and a semicolon is used to indicate end of numbers stream. 每个数字用逗号分隔,分号用于表示数字流的结尾。 At the moment I am reading the text file line by line using the fgets. 目前,我正在使用fget逐行读取文本文件。 For this reason I am using a fixed array of 1024 elements (the maximum characters per line for a text file). 因此,我使用的是1024个元素的固定数组(文本文件每行的最大字符数)。 This is not the ideal way how to implement this since if only one number is inputted in the text file, an array of 1024 elements will we pointless. 这不是实现此方法的理想方法,因为如果在文本文件中仅输入一个数字,则我们将毫无意义地使用1024个元素的数组。 Is there a way to use fgets with the malloc function (or any other method) to increase memory efficiency? 有没有一种方法可以将fgets与malloc函数(或任何其他方法)一起使用以提高内存效率?

If you are looking into using this in a production code then I would request you to follow the suggestions put in the comments section. 如果您想在生产代码中使用它,那么我将请您遵循注释部分中的建议。

But if you requirement is more for learning or school, then here is a complex approach. 但是,如果您对学习或学校的要求更高,那么这是一种复杂的方法。

Pseudo code 伪代码

1. Find the size of the file in bytes, you can use "stat" for this.
2. Since the file format is known, from the file size, calculate the number of items.
3. Use the number of items to malloc.

Voila! 瞧! :p :p

How to find file size 如何查找文件大小

You can use stat as shown below: 您可以使用stat ,如下所示:

#include <sys/stat.h>
#include <stdio.h>

int main(void)
{
    struct stat st;

    if (stat("file", &st) == 0) {
        printf("fileSize: %d  No. of Items: %d\n", (st.st_size), (st.st_size/2));
        return st.st_size;
    }

    printf("failed!\n");
    return 0;
}

This file when run will return the file size: 该文件在运行时将返回文件大小:

$> cat file
1;
$> ./a.out
fileSize: 3  No. of Items: 1

$> cat file
1,2,3;
$> ./a.out
fileSize: 7  No. of Items: 3

Disclaimer : Is this approach to minimize the pre-allocated memory an optimal approach? 免责声明 :这种将预分配内存最小化的方法是否是最佳方法? No ways in heaven! 天上没有路! :) :)

Dynamically allocating space for you data is a fundamental tool for working in C. You might as well pay the price to learn. 动态地为您的数据分配空间是使用C语言工作的基本工具。您不妨为此付出代价。 The primary thing to remember is, 要记住的主要事情是,

"if you allocate memory, you have the responsibility to track its use and preserve a pointer to the starting address for the block of memory so you can free it when you are done with it. Otherwise your code with leak memory like a sieve." “如果分配内存,则您有责任跟踪其使用并保留指向该内存块起始地址的指针,以便在完成处理后可以释放它。否则,您的代码就会像筛子一样泄漏。”

Dynamic allocation is straight forward. 动态分配是直接的。 You allocate some initial block of memory and keep track of what you add to it. 您分配一些初始的内存块,并跟踪添加的内容。 You must test that each allocation succeeds. 您必须测试每个分配是否成功。 You must test how much of the block of memory you use and reallocate or stop writing data when full to prevent writing beyond the end of your block of memory. 您必须测试使用了多少内存块,并在内存已满时重新分配或停止写入数据,以防止写入超出内存块末尾的情况。 If you fail to test either, you will corrupt the memory associated with your code. 如果您都无法测试,则将破坏与代码关联的内存。

When you reallocate, always reallocate using a temporary pointer because with a reallocation failure, the original block of memory is freed. 重新分配时,请始终使用临时指针进行重新分配,因为重新分配失败会释放原始的内存块。 (causing loss of all previous data in that block). (造成该区块中所有先前资料的遗失)。 Using a temporary pointer allows you to handle failure in a manner to preserve that block if needed. 使用临时指针可使您以需要的方式处理故障,以保留该块。

Taking that into consideration, below we initially allocate space for 64 long values (you can easily change to code to handle any type, eg int , float , double ...). 考虑到这一点,下面我们首先为64个long值分配空间(您可以轻松地更改代码以处理任何类型,例如intfloatdouble ...)。 The code then reads each line of data (using getline to dynamically allocate the buffer for each line). 然后,代码读取每行数据(使用getline为每行动态分配缓冲区)。 strtol is used to parse the buffer assigning values to the array . strtol用于解析将值分配给array的缓冲区。 idx is used as an index to keep track of how many values have been read, and when idx reaches the current nmax , array is reallocated twice as large as it previously was and nmax is updated to reflect the change. idx用作索引以跟踪已读取的值,并且idx达到当前nmax ,将重新分配array ,大小是以前的两倍,并且nmax会更新以反映更改。 The reading, parsing, checking and reallocating continues for every line of data in the file. 继续读取,解析,检查和重新分配文件中每一行数据。 When done, the values are printed to stdout, showing the 400 random values read from the test file formatted as 353,394,257,...293,58,135; 完成后,将这些值打印到stdout,显示从测试文件读取的400个随机值,格式为353,394,257,...293,58,135;

To keep the read loop logic clean, I've put the error checking for the strtol conversion into a function xstrtol , but you are free to include that code in main() if you like. 为了保持读取循环逻辑的整洁,我将strtol转换的错误检查放入了xstrtol函数中,但是您可以随意在main()包括该代码。 The same applies to the realloc_long function. realloc_long函数也是realloc_long To see when the reallocation takes place, you can compile the code with the -DDEBUG definition. 要查看重新分配的时间,可以使用-DDEBUG定义编译代码。 Eg: 例如:

gcc -Wall -Wextra -DDEBUG -o progname yoursourcefile.c

The program expects your data filename as the first argument and you can provide an optional conversion base as the second argument (default is 10). 该程序希望将数据文件名作为第一个参数,并且您可以提供一个可选的转换基础作为第二个参数(默认为10)。 Eg: 例如:

./progname datafile.txt [base (default: 10)]

Look over it, test it, and let me know if you have any questions. 查看它,对其进行测试,如果您有任何问题,请告诉我。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>

#define NMAX 64

long xstrtol (char *p, char **ep, int base);
long *realloc_long (long *lp, unsigned long *n);

int main (int argc, char **argv)
{

    char *ln = NULL;                /* NULL forces getline to allocate  */
    size_t n = 0;                   /* max chars to read (0 - no limit) */
    ssize_t nchr = 0;               /* number of chars actually read    */
    size_t idx = 0;                 /* array index counter              */
    long *array = NULL;             /* pointer to long                  */
    unsigned long nmax = NMAX;      /* initial reallocation counter     */
    FILE *fp = NULL;                /* input file pointer               */
    int base = argc > 2 ? atoi (argv[2]) : 10; /* base (default: 10)    */

    /* open / validate file */
    if (!(fp = fopen (argv[1], "r"))) {
        fprintf (stderr, "error: file open failed '%s'.", argv[1]);
        return 1;
    }

    /* allocate array of NMAX long using calloc to initialize to 0 */
    if (!(array = calloc (NMAX, sizeof *array))) {
        fprintf (stderr, "error: memory allocation failed.");
        return 1;
    }

    /* read each line from file - separate into array       */
    while ((nchr = getline (&ln, &n, fp)) != -1)
    {
        char *p = ln;      /* pointer to ln read by getline */ 
        char *ep = NULL;   /* endpointer for strtol         */

        while (errno == 0)
        {   /* parse/convert each number in line into array */
            array[idx++] = xstrtol (p, &ep, base);

            if (idx == nmax)        /* check NMAX / realloc */
                array = realloc_long (array, &nmax);

            /* skip delimiters/move pointer to next digit   */
            while (*ep && *ep != '-' && (*ep < '0' || *ep > '9')) ep++;
            if (*ep)
                p = ep;
            else
                break;
        }
    }

    if (ln) free (ln);              /* free memory allocated by getline */
    if (fp) fclose (fp);            /* close open file descriptor       */

    int i = 0;
    for (i = 0; i < idx; i++)
        printf (" array[%d] : %ld\n", i, array[i]);

    free (array);

    return 0;
}

/* reallocate long pointer memory */
long *realloc_long (long *lp, unsigned long *n)
{
    long *tmp = realloc (lp, 2 * *n * sizeof *lp);
#ifdef DEBUG
    printf ("\n  reallocating %lu to %lu\n", *n, *n * 2);
#endif
    if (!tmp) {
        fprintf (stderr, "%s() error: reallocation failed.\n", __func__);
        // return NULL;
        exit (EXIT_FAILURE);
    }
    lp = tmp;
    memset (lp + *n, 0, *n * sizeof *lp); /* memset new ptrs 0 */
    *n *= 2;

    return lp;
}

long xstrtol (char *p, char **ep, int base)
{
    errno = 0;

    long tmp = strtol (p, ep, base);

    /* Check for various possible errors */
    if ((errno == ERANGE && (tmp == LONG_MIN || tmp == LONG_MAX)) ||
        (errno != 0 && tmp == 0)) {
        perror ("strtol");
        exit (EXIT_FAILURE);
    }

    if (*ep == p) {
        fprintf (stderr, "No digits were found\n");
        exit (EXIT_FAILURE);
    }

    return tmp;
}

Sample Output (with -DDEBUG to show reallocation) 样本输出(带有-DDEBUG以显示重新分配)

$ ./bin/read_long_csv dat/randlong.txt

  reallocating 64 to 128

  reallocating 128 to 256

  reallocating 256 to 512
 array[0] : 353
 array[1] : 394
 array[2] : 257
 array[3] : 173
 array[4] : 389
 array[5] : 332
 array[6] : 338
 array[7] : 293
 array[8] : 58
 array[9] : 135
<snip>
 array[395] : 146
 array[396] : 324
 array[397] : 424
 array[398] : 365
 array[399] : 205

Memory Error Check 内存错误检查

$ valgrind ./bin/read_long_csv dat/randlong.txt
==26142== Memcheck, a memory error detector
==26142== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26142== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26142== Command: ./bin/read_long_csv dat/randlong.txt
==26142==

  reallocating 64 to 128

  reallocating 128 to 256

  reallocating 256 to 512
 array[0] : 353
 array[1] : 394
 array[2] : 257
 array[3] : 173
 array[4] : 389
 array[5] : 332
 array[6] : 338
 array[7] : 293
 array[8] : 58
 array[9] : 135
<snip>
 array[395] : 146
 array[396] : 324
 array[397] : 424
 array[398] : 365
 array[399] : 205
==26142==
==26142== HEAP SUMMARY:
==26142==     in use at exit: 0 bytes in 0 blocks
==26142==   total heap usage: 7 allocs, 7 frees, 9,886 bytes allocated
==26142==
==26142== All heap blocks were freed -- no leaks are possible
==26142==
==26142== For counts of detected and suppressed errors, rerun with: -v
==26142== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM