简体   繁体   English

通过C中的stdin读取大型列表

[英]reading large lists through stdin in C

If my program is going to have large lists of numbers passed in through stdin , what would be the most efficient way of reading this in? 如果我的程序将通过stdin传入大量的数字列表,那么读取它的最有效方法是什么?

The input I'm going to be passing into the program is going to be of the following format: 我将要传入程序的输入将采用以下格式:

3,5;6,7;8,9;11,4;; 

I need to process the input so that I can use the numbers between the colons (ie I want to be able to use 3 and 5, 6 and 7 etc etc). 我需要处理输入,以便我可以使用冒号之间的数字(即我希望能够使用3和5,6和7等等)。 The ;; ;; indicates that it is the end of the line. 表示它是该行的结尾。

I was thinking of using a buffered reader to read entire lines and then using parseInt. 我正在考虑使用缓冲读取器来读取整行,然后使用parseInt。

Would this be the most efficient way of doing it? 这是最有效的方式吗?

This is a working solution 这是一个有效的解决方案
One way to do this is to use strtok() and store the values in an array. 一种方法是使用strtok()并将值存储在数组中。 Ideally, dynamically allocated. 理想情况下,动态分配。

 int main(int argc, char *argv[])
{
    int lst_size=100;
    int line_size=255;

    int lst[lst_size];
    int count=0;

    char buff[line_size];
    char * token=NULL;
    fgets (buff, line_size, stdin); //Get input

Using strtok by passing ',' and ';' 通过传递','和';'来使用strtok as deleminator. 作为代理人。

    token=strtok(buff, ";,");
    lst[count++]=atoi(token); 
    while(token=strtok(NULL, ";,")){
          lst[count++]=atoi(token);
    }

Finally you have to account for the double ";;" 最后你必须考虑双“;;” by reducing the count by 1, because atoi(token) will return 0 for that case and store it in the nth index. 通过将计数减少1,因为atoi(令牌)将为该情况返回0并将其存储在第n个索引中。 Which you don't want. 你不想要的。

  count--;

}

I'm a little rusty at C, but could this work for you? 我在C有点生气,但这对你有用吗?

char[1000] remainder;
int first, second;
fp = fopen("C:\\file.txt", "r"); // Error check this, probably.
while (fgets(&remainder, 1000, fp) != null) { // Get a line.
    while (sscanf(remainder, "%d,%d;%s", first, second, remainder) != null) {
        // place first and second into a struct or something
    }
}

What you could do is read in from stdin using fgets or fgetc . 你可以做的是使用fgetsfgetcstdin读入。 You could also use getline() since you're reading in from stdin. 你也可以使用getline(),因为你是从stdin读入的。

Once you read in the line you can use strtok() with the delimiter for ";" 一旦你读到该行,你就可以使用带有“;”分隔符的strtok() to split the string into pieces at the semicolons. 将字符串拆分为分号。 You can loop through until strok() is null, or in this case, ';'. 你可以循环直到strok()为null,或者在这种情况下为';'。 Also in C you should use atoi() to convert strings to integers. 同样在C中你应该使用atoi()将字符串转换为整数。

For Example: 例如:

int length = 256;
char* str = (char*)malloc(length);
int err = getline(&str, &length, stdin);

I would read in the command args, then parse using the strtok() library method 我会在命令args中读取,然后使用strtok()库方法解析

http://man7.org/linux/man-pages/man3/strtok.3.html

(The web page referenced by the URL above even has a code sample of how to use it.) (上面的URL引用的网页甚至有一个如何使用它的代码示例。)

One other fairly elegant way to handle this is to allow strtol to parse the input by advancing the string to be read to endptr as returned by strtol . 另一个相当优雅的方法是允许strtol通过将要读取的字符串提前到strtol返回的endptr来解析输入。 Combined with an array allocated/reallocated as needed, you should be able to handle lines of any length (up to memory exhaustion). 结合根据需要分配/重新分配的数组,您应该能够处理任何长度的行(直到内存耗尽)。 The example below uses a single array for the data. 下面的示例使用单个数组作为数据。 If you want to store multiple lines, each as a separate array, you can use the same approach, but start with a pointer to array of pointers to int. 如果要存储多行,每个行作为一个单独的数组,可以使用相同的方法,但从指向int的指针数组的指针开始。 (ie int **numbers and allocate the pointers and then each array). (即int **numbers并分配指针,然后分配每个数组)。 Let me know if you have questions: 如果您有疑问,请告诉我:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define NMAX 256

int main () {

    char *ln = NULL;                /* NULL forces getline to allocate  */
    size_t n = 0;                   /* max chars to read (0 - no limit) */
    ssize_t nchr = 0;               /* number of chars actually read    */
    int *numbers = NULL;            /* array to hold numbers            */
    size_t nmax = NMAX;             /* check for reallocation           */
    size_t idx = 0;                 /* numbers array index              */

    if (!(numbers = calloc (NMAX, sizeof *numbers))) {
        fprintf (stderr, "error: memory allocation failed.");
        return 1;
    }

    /* read each line from stdin - dynamicallly allocated   */
    while ((nchr = getline (&ln, &n, stdin)) != -1)
    {
        char *p = ln;       /* pointer for use with strtol  */
        char *ep = NULL;

        errno = 0;
        while (errno == 0)
        {
            /* parse/convert each number on stdin   */ 
            numbers[idx] = strtol (p, &ep, 10);
            /* note: overflow/underflow checks omitted */
            /* if valid conversion to number */
            if (errno == 0 && p != ep)
            {
                idx++;              /* increment index      */
                if (!ep) break;     /* check for end of str */
            }

            /* skip delimiters/move pointer to next digit   */
            while (*ep && (*ep <= '0' || *ep >= '9')) ep++;
            if (*ep) 
                p = ep;
            else 
                break;

            /* reallocate numbers if idx = nmax */
            if (idx == nmax)
            {
                int *tmp = realloc (numbers, 2 * nmax * sizeof *numbers);
                if (!tmp) {
                    fprintf (stderr, "Error: struct reallocation failure.\n");
                    exit (EXIT_FAILURE);
                }
                numbers = tmp;
                memset (numbers + nmax, 0, nmax * sizeof *numbers);
                nmax *= 2;
            }
        }
    }

    /* free mem allocated by getline */
    if (ln) free (ln);

    /* show values stored in array   */
    size_t i = 0;
    for (i = 0; i < idx; i++)
        printf (" numbers[%2zu]  %d\n", i, numbers[i]);

    /* free mem allocate to numbers  */
    if (numbers) free (numbers);

    return 0;
}

Output 产量

$ echo "3,5;6,7;8,9;11,4;;" | ./bin/prsistdin
 numbers[ 0]  3
 numbers[ 1]  5
 numbers[ 2]  6
 numbers[ 3]  7
 numbers[ 4]  8
 numbers[ 5]  11
 numbers[ 6]  4

Also works where the string is stored in a file as: 也适用于将字符串存储在文件中的位置:

$ cat dat/numsemic.csv | ./bin/prsistdin
or
$ ./bin/prsistdin < dat/numsemic.csv

Using fgets and without size_t 使用fgets而不使用size_t

It took a little reworking to come up with a revision I was happy with that eliminated getline and substituted fgets . 我花了一点时间做了一个修改,我很满意那个淘汰的getline和替换的fgets getline is far more flexible, handling the allocation of space for you, with fgets it is up to you. getline更灵活,为你处理空间分配, fgets取决于你。 (not to mention getline returning the actual number of chars read without having to call strlen ). (更不用说getline返回实际读取的字符数而不必调用strlen )。

My goal here was to preserve the ability to read any length line to meet your requirement. 我的目标是保持阅读任何长度线以满足您的要求的能力。 That either meant initially allocating some huge line buffer (wasteful) or coming up with a scheme that would reallocate the input line buffer as needed in the event it was longer than the space initially allocate to ln . 这或者意味着最初分配一些巨大的行缓冲区(浪费)或者提出一个方案,如果它比最初分配给ln的空间长,它将根据需要重新分配输入行缓冲区。 (this is what getline does so well). (这就是getline做得很好)。 I'm reasonably happy with the results. 我对结果感到满意。 Note: I put the reallocation code in functions to keep main reasonably clean. 注意:我将重新分配代码放在函数中以保持main合理清洁。 footnote 2 脚注2

Take a look at the following code. 看看下面的代码。 Note, I have left the DEBUG preprocessor directives in the code allowing you to compile with the -DDEBUG flag if you want to have it spit out each time it allocates. 注意,我已经在代码中保留了DEBUG预处理程序指令,如果你想在每次分配时吐出,都可以使用-DDEBUG标志进行编译。 [footnote 1] You can compile the code with: [脚注1]您可以使用以下代码编译代码:

gcc -Wall -Wextra -o yourexename yourfilename.c

or if you want the debugging output (eg set LMAX to 2 or something less than the line length), use the following: 或者如果您想要调试输出(例如,将LMAX设置为2或小于行长度的东西),请使用以下命令:

gcc -Wall -Wextra -o yourexename yourfilename.c -DDEBUG

Let me know if you have questions: 如果您有疑问,请告诉我:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define NMAX 256
#define LMAX 1024

char *realloc_char (char *sp, unsigned int *n); /* reallocate char array    */
int *realloc_int (int *sp, unsigned int *n);    /* reallocate int array     */
char *fixshortread (FILE *fp, char **s, unsigned int *n); /* read all stdin */

int main () {

    char *ln = NULL;                    /* dynamically allocated for fgets  */
    int *numbers = NULL;                /* array to hold numbers            */
    unsigned int nmax = NMAX;           /* numbers check for reallocation   */
    unsigned int lmax = LMAX;           /* ln check for reallocation        */
    unsigned int idx = 0;               /* numbers array index              */
    unsigned int i = 0;                 /* simple counter variable          */
    char *nl = NULL;

    /* initial allocation for numbers */
    if (!(numbers = calloc (NMAX, sizeof *numbers))) {
        fprintf (stderr, "error: memory allocation failed (numbers).");
        return 1;
    }

    /* initial allocation for ln */
    if (!(ln = calloc (LMAX, sizeof *ln))) {
        fprintf (stderr, "error: memory allocation failed (ln).");
        return 1;
    }

    /* read each line from stdin - dynamicallly allocated   */
    while (fgets (ln, lmax, stdin) != NULL)
    {
        /* provide a fallback to read remainder of line
        if the line length exceeds lmax */
        if (!(nl = strchr (ln, '\n')))
            fixshortread (stdin, &ln, &lmax); 
        else
            *nl = 0;

        char *p = ln;       /* pointer for use with strtol  */
        char *ep = NULL;

        errno = 0;
        while (errno == 0)
        {
            /* parse/convert each number on stdin   */
            numbers[idx] = strtol (p, &ep, 10);
            /* note: overflow/underflow checks omitted */
            /* if valid conversion to number */
            if (errno == 0 && p != ep)
            {
                idx++;              /* increment index      */
                if (!ep) break;     /* check for end of str */
            }

            /* skip delimiters/move pointer to next digit   */
            while (*ep && (*ep <= '0' || *ep >= '9')) ep++;
            if (*ep)
                p = ep;
            else
                break;

            /* reallocate numbers if idx = nmax */
            if (idx == nmax)
                realloc_int (numbers, &nmax);
        }
    }

    /* free mem allocated by getline */
    if (ln) free (ln);

    /* show values stored in array   */
    for (i = 0; i < idx; i++)
        printf (" numbers[%2u]  %d\n", (unsigned int)i, numbers[i]);

    /* free mem allocate to numbers  */
    if (numbers) free (numbers);

    return 0;
}

/* reallocate character pointer memory */
char *realloc_char (char *sp, unsigned int *n)
{
    char *tmp = realloc (sp, 2 * *n * sizeof *sp);
#ifdef DEBUG
    printf ("\n  reallocating %u to %u\n", *n, *n * 2);
#endif
    if (!tmp) {
        fprintf (stderr, "Error: char pointer reallocation failure.\n");
        exit (EXIT_FAILURE);
    }
    sp = tmp;
    memset (sp + *n, 0, *n * sizeof *sp); /* memset new ptrs 0 */
    *n *= 2;

    return sp;
}

/* reallocate integer pointer memory */
int *realloc_int (int *sp, unsigned int *n)
{
    int *tmp = realloc (sp, 2 * *n * sizeof *sp);
#ifdef DEBUG
    printf ("\n  reallocating %u to %u\n", *n, *n * 2);
#endif
    if (!tmp) {
        fprintf (stderr, "Error: int pointer reallocation failure.\n");
        exit (EXIT_FAILURE);
    }
    sp = tmp;
    memset (sp + *n, 0, *n * sizeof *sp); /* memset new ptrs 0 */
    *n *= 2;

    return sp;
}

/* if fgets fails to read entire line, fix short read */
char *fixshortread (FILE *fp, char **s, unsigned int *n)
{
    unsigned int i = 0;
    int c = 0;

    i = *n - 1;
    realloc_char (*s, n);
    do
    {
        c = fgetc (fp);
        (*s)[i] = c;
        i++;
        if (i == *n)
            realloc_char (*s, n);
    } while (c != '\n' && c != EOF);
    (*s)[i-1] = 0;

    return *s;
}

footnote 1 脚注1

nothing special about the choice of the word DEBUG (it could have been DOG , etc..), the point to take away is if you want to conditionally include/exclude code, you can simply use preprocessor flags to do that. 没有什么特别的关于DEBUG这个词的选择(可能是DOG等等),要带走的一点是,如果你想有条件地包含/排除代码,你可以简单地使用预处理器标志来做到这一点。 You just add -Dflagname to pass flagname to the compiler. 您只需添加-Dflagname以将flagname传递给编译器。

footnote 2 脚注2

you can combine the reallocation functions into a single void* function that accepts a void pointer as its argument along with the size of the type to be reallocated and returns a void pointer to the reallocated space -- but we will leave that for a later date. 你可以将重新分配函数组合成一个void*函数,该函数接受一个void指针作为其参数以及要重新分配的类型的size ,并返回一个指向重新分配空间的void指针 - 但是我们将把它留给以后的日期。

getchar_unlocked() is what you are looking for. getchar_unlocked()就是你要找的。

Here is the code: 这是代码:

#include <stdio.h>

inline int fastRead_int(int * x)
{
  register int c = getchar_unlocked();
  *x = 0;

  // clean stuff in front of + look for EOF
  for(; ((c<48 || c>57) && c != EOF); c = getchar_unlocked());
  if(c == EOF)
    return 0;

  // build int
  for(; c>47 && c<58 ; c = getchar_unlocked()) {
    *x = (*x<<1) + (*x<<3) + c - 48;
  }
  return 1;
}

int main()
{
  int x;
  while(fastRead_int(&x))
    printf("%d ",x);
  return 0;
}

For input 1;2;2;;3;;4;;;;;54;;;; 输入1;2;2;;3;;4;;;;;54;;;; the code above produces 1 2 2 3 4 54 . 上面的代码产生1 2 2 3 4 54

I guarantee, this solution is a lot faster than others presented in this topic. 我保证,这个解决方案比本主题中介绍的其他解决方案快得多。 It is not only using getchar_unlocked() , but also uses register , inline as well as multiplying by 10 tricky way: (*x<<1) + (*x<<3) . 它不仅使用getchar_unlocked() ,还使用registerinline以及乘以10种棘手的方式: (*x<<1) + (*x<<3)

I wish you good luck in finding better solution. 祝你找到更好的解决方案,祝你好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM