简体   繁体   English

读取文本文件,将每一行分成单独的数组,然后在C中排序

[英]Read text file, break each line into separate arrays and sort in C

I am trying to write a program that reads a text file and breaks each line into separate arrays so they can be sorted by date and name. 我正在尝试编写一个程序,该程序读取文本文件并将每一行分成单独的数组,以便可以按日期和名称进行排序。 I am still having trouble getting the 'sort by date' function to work/display properly, which is why I haven't attempted the sort by name function yet. 我仍然无法使“按日期排序”功能正常工作/显示,这就是为什么我还没有尝试按名称排序功能的原因。

I seem to be able to scan in the date and name arrays fine, but I think I need to modify the way I scan in the last array 'dates' because I need to separate them with a space after a comma. 我似乎能够很好地扫描日期和名称数组,但是我想我需要修改在最后一个数组“日期”中扫描的方式,因为我需要在逗号后用空格分隔它们。 Problem is that I am not sure how to scan them in as a string seeing as they would have spaces between them and some names have different numbers of states. 问题是我不确定如何以字符串的形式扫描它们,因为它们之间会有空格,并且某些名称的状态数不同。 (I removed the spaces between the states in the text file at this point, but the text file needs to have them back in probably?) (这时我删除了文本文件中状态之间的空格,但是文本文件可能需要将它们放回去吗?)

My code so far... 到目前为止,我的代码...

#include <stdio.h>
#include <string.h>

#define MAX 30

void sortByDate( int year[], char *name[], char *states[], int count);
void sortByName(int year[], char name[], char states[], int count);

int main()
{
     int year[MAX]; 
     int i, a;
     int count = 0;
     int choice;
     char *name[MAX],
          *states[MAX];
     char b[MAX], c[MAX];

     FILE *inp = fopen("hurricanes.txt","r");               /* defining file input    */

     for(i=0;i<MAX;i++)
     {
         if( feof(inp) )
        {
            break;
        } 
        fscanf(inp, "%d", &a);
        fscanf(inp, "%s", &b);
        fscanf(inp, "%s", &c);
        year[i]=a;
        strcpy(&name[i],b);
        strcpy(&states[i],c);
        ++count; 

        printf("%d %s %s\n", year[i], &name[i], &states[i]);
     }

     printf("Press 0 to sort by date or 1 to sort by name: ");
     scanf("%d", &choice);  
     if (choice == 0)
     {
         sortByDate(year, name, states, count); 
     }
     else if ( choice == 1)
     {
          //sortByName(year, name, states, count); 
     }

     getch();
     return 0;
}

void sortByDate( int year[], char *name[], char *states[], int count )
{
     int d = 0;
     int c = 0;

     int yearTmp;
     char nameTmp[MAX], statesTmp[MAX];
     int order[count];
     int tmp = 0;

     FILE *outp = fopen("report.txt","w");                 /* defining file output   */

     for (c = 0; c < count; ++c)
     {
         order[c] = c; 
     } 

     for (c = 0 ; c < ( count - 1 ); c++)
     {
          for (d = 0 ; d < count - c - 1; d++)
          {
               if (year[d] > year[d+1])
               {
                    yearTmp = year[d];
                    year[d] = year[d+1]; 
                    year[d+1] = yearTmp; 

                    tmp = order[d];
                    order[d] = order[d+1];
                    order[d+1] = tmp;   
              }
          }
     }

     for (c = 0; c < count; ++c)
     {
          printf("%d %-10s %s\n",  year[c], &name[order[c]], &states[order[c]]); 
     } 
}

//void sortByName(int year[], char name[], char states[], int count)
//{
//} 

The hurricanes.txt file....(again, I have removed the spaces between states but I think they need to be put back in and scanned differently?) hurricanes.txt文件...。(再次,我删除了状态之间的空格,但我认为需要将它们放回去并进行不同的扫描吗?)

1960 Donna FL,NC
1969 Camille MS
1972 Agnes FL
1983 Alicia TX
1989 Hugo SC,NC
2005 Katrina FL,LA,MS
2005 Rita TX,LA
2005 Wilma FL
2008 Ike TX
2009 Ida MS
2011 Irene NC,NJ,MA,VT
2012 Isaac LA
1992 Andrew FL,LA
1995 Opal FL,AL
1999 Floyd NC
2003 Isabel NC,VA
2004 Charley FL,SC,NC
2004 Frances FL
2004 Ivan AL
2004 Jeanne FL

Ok, So I made some changes from the suggestions posted here and they worked out great! 好的,所以我对此处发布的建议进行了一些更改,它们的效果非常好!

Rather than storing the values in separate arrays, there may be a better approach. 与其将值存储在单独的数组中,不如使用更好的方法。 Anytime you are faced with sorting data that consists of multiple, related values, you should be thinking struct . 每当您面对包含多个相关值的数据排序时,都应该考虑struct That is the mechanism in C that provides a way to correlate a sort across data comprised of multiple variables. 这就是C语言中的机制,它提供了一种方法来关联包含多个变量的数据中的排序。

For example, in your case you have (1) the year , (2) the hurricane name , and (3) the hurricane path through states that all represent a single event. 例如,在您的情况下,您拥有(1) year ,(2)飓风name和(3)穿过所有代表单个事件的州的飓风path When you have data that consists of multiple events that you want to sort by either year , name or path , you need a way to preserve the correlation between which name occurred which year and took what path. 当您具有要根据yearnamepath排序的多个事件组成的数据时,您需要一种方法来保留哪个名称发生在哪年和采用哪个路径之间的相关性。 A simple structure such as the following will do 像下面这样的简单结构

typedef struct {
    unsigned year;
    char name[MAXC];
    char path[MAXC];
} hcdata;

In your program, you can then declare and array of type hcdata and fill the array with data read from your file. 在你的程序,然后你可以声明和类型的数组 hcdata并填写与您的文件中读取数据的阵列。 While you will generally want to read a line at a time with the line-oriented input functions ( fgets or getline ), when you have the exact same format of information on each line, the scanf family of functions can provide an realistic alternative. 尽管通常希望使用面向行的输入函数( fgetsgetline )一次读取一行,但是当每行上的信息格式完全相同时, scanf系列函数可以提供一种现实的选择。 (it is one of the limited time scanf is a realistic alternative to fgets , etc..) (它是有限时间的之一, scanffgets等的现实替代。)

To make your read with fscanf work properly, you should account for each character in the line being read (including the '\\n' ). 为了使使用fscanf的读取正常工作,您应该考虑正在读取的行中的每个字符(包括'\\n' )。 While reading integer values will skip intervening whitespace, if you get in the habit of accounting for each character, you won't be surprised when your next line begins with a character. 虽然读取整数值会跳过中间的空格,但是如果您习惯于对每个字符都作说明,那么当下一行以字符开头时,您不会感到惊讶。 In this case you can use a format-string of: 在这种情况下,您可以使用以下格式字符串:

    char *fmt = "%u %31[^ ] %31[^\n]%*c";

Lastly, qsort is the default standard way to sort data in the the C library. 最后, qsort是对C库中的数据进行排序的默认标准方法。 It is optimized with several differing sort methods that are optimized for large/small datasets that work together to produce a blistering fast sort. 它使用几种不同的排序方法进行了优化,这些方法针对大型/小型数据集进行了优化,这些数据一起工作以产生快速的快速排序。 All you need to do is write compare functions to pass to qsort . 您所需要做的就是编写比较函数以传递给qsort With the struct above, to sort on either name or year , the compare functions are almost trivial. 使用上面的结构,按nameyear进行排序, 比较功能几乎是微不足道的。 For example, 例如,

int cmpname (const void *a, const void *b)
{   return strcmp (((hcdata *)a)->name, ((hcdata *)b)->name); }

int cmpyear (const void *a, const void *b)
{   return ((hcdata *)a)->year - ((hcdata *)b)->year; }

You can use as simple a sort function as is necessary, or you can tailor the sort to sort on secondary parameters in the event there is no difference between primary values. 您可以根据需要使用简单的排序功能,也可以在主值之间没有差异的情况下定制排序以对辅助参数进行排序。 For example, sort alphabetically by name if the year is the same: 例如,如果year相同,则按name按字母顺序排序:

int cmpyear (const void *a, const void *b)
{   
    int diff = ((hcdata *)a)->year - ((hcdata *)b)->year;

    if (diff > 0)   /* if years differ, sort by year */
        return 1;
    else if (diff < 0)
        return -1;

    /* otherwise sort alphabetically with same year */
    return strcmp (((hcdata *)a)->name, ((hcdata *)b)->name);
}

Sorting then becomes as simple as a single call, eg: 这样,排序就变得像单个调用一样简单,例如:

    qsort (hcd, idx, sizeof *hcd, cmpname);

Putting all of the pieces together, you can read your data, sort by name and then sort again by year in a straightforward manner: 将所有部分放在一起,您可以读取数据,按name排序,然后以简单的方式按year再次排序:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { MAXC = 32, MAXL = 128 };

typedef struct {
    unsigned year;
    char name[MAXC];
    char path[MAXC];
} hcdata;

int cmpname (const void *a, const void *b);
int cmpyear (const void *a, const void *b);
void prndata (hcdata *h, size_t n);

int main (int argc, char **argv) {

    hcdata hcd[MAXL] = {{ 0, {""}, {""} }};
    size_t idx = 0;
    char *fmt = "%u %31[^ ] %31[^\n]%*c";
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    /* read each line of data into an array of struct */
    while (idx < MAXL && fscanf (fp, fmt,
        &hcd[idx].year, hcd[idx].name, hcd[idx].path) == 3)
        idx++;

    printf ("\noriginal file order:\n\n");
    prndata (hcd, idx);

    qsort (hcd, idx, sizeof *hcd, cmpname);    /* sort by name */
    printf ("\nsorted by hurricane name:\n\n");
    prndata (hcd, idx);

    qsort (hcd, idx, sizeof *hcd, cmpyear);    /* sort by year */
    printf ("\nsorted by year:\n\n");
    prndata (hcd, idx);

    if (fp != stdin) fclose (fp);

    return 0;
}

int cmpname (const void *a, const void *b)
{   return strcmp (((hcdata *)a)->name, ((hcdata *)b)->name); }

int cmpyear (const void *a, const void *b)
{   return ((hcdata *)a)->year - ((hcdata *)b)->year; }

void prndata (hcdata *h, size_t n)
{
    if (!h || !n) return;
    size_t i;
    for (i = 0; i < n; i++)
        printf (" y: %u    n: %-10s    p: %s\n", h[i].year,
                h[i].name, h[i].path);
}

Example Use/Output 使用/输出示例

$ ./bin/hurricanes <dat/hurricanes.txt

original file order:

 y: 1960    n: Donna         p: FL,NC
 y: 1969    n: Camille       p: MS
 y: 1972    n: Agnes         p: FL
 y: 1983    n: Alicia        p: TX
 y: 1989    n: Hugo          p: SC,NC
 y: 2005    n: Katrina       p: FL,LA,MS
 y: 2005    n: Rita          p: TX,LA
 y: 2005    n: Wilma         p: FL
 y: 2008    n: Ike           p: TX
 y: 2009    n: Ida           p: MS
 y: 2011    n: Irene         p: NC,NJ,MA,VT
 y: 2012    n: Isaac         p: LA
 y: 1992    n: Andrew        p: FL,LA
 y: 1995    n: Opal          p: FL,AL
 y: 1999    n: Floyd         p: NC
 y: 2003    n: Isabel        p: NC,VA
 y: 2004    n: Charley       p: FL,SC,NC
 y: 2004    n: Frances       p: FL
 y: 2004    n: Ivan          p: AL
 y: 2004    n: Jeanne        p: FL

sorted by hurricane name:

 y: 1972    n: Agnes         p: FL
 y: 1983    n: Alicia        p: TX
 y: 1992    n: Andrew        p: FL,LA
 y: 1969    n: Camille       p: MS
 y: 2004    n: Charley       p: FL,SC,NC
 y: 1960    n: Donna         p: FL,NC
 y: 1999    n: Floyd         p: NC
 y: 2004    n: Frances       p: FL
 y: 1989    n: Hugo          p: SC,NC
 y: 2009    n: Ida           p: MS
 y: 2008    n: Ike           p: TX
 y: 2011    n: Irene         p: NC,NJ,MA,VT
 y: 2012    n: Isaac         p: LA
 y: 2003    n: Isabel        p: NC,VA
 y: 2004    n: Ivan          p: AL
 y: 2004    n: Jeanne        p: FL
 y: 2005    n: Katrina       p: FL,LA,MS
 y: 1995    n: Opal          p: FL,AL
 y: 2005    n: Rita          p: TX,LA
 y: 2005    n: Wilma         p: FL

sorted by year:

 y: 1960    n: Donna         p: FL,NC
 y: 1969    n: Camille       p: MS
 y: 1972    n: Agnes         p: FL
 y: 1983    n: Alicia        p: TX
 y: 1989    n: Hugo          p: SC,NC
 y: 1992    n: Andrew        p: FL,LA
 y: 1995    n: Opal          p: FL,AL
 y: 1999    n: Floyd         p: NC
 y: 2003    n: Isabel        p: NC,VA
 y: 2004    n: Charley       p: FL,SC,NC
 y: 2004    n: Frances       p: FL
 y: 2004    n: Ivan          p: AL
 y: 2004    n: Jeanne        p: FL
 y: 2005    n: Katrina       p: FL,LA,MS
 y: 2005    n: Rita          p: TX,LA
 y: 2005    n: Wilma         p: FL
 y: 2008    n: Ike           p: TX
 y: 2009    n: Ida           p: MS
 y: 2011    n: Irene         p: NC,NJ,MA,VT
 y: 2012    n: Isaac         p: LA

Compare the implementation here with your approach to the storage in individual arrays, as well as your sort routines. 将此处的实现与您在单个阵列中存储方法以及排序例程进行比较。 While the code above relies on a statically declared array of structs, there is no reason you cannot dynamically declare structures as need dynamically, if you are faced with reading an unknown number. 尽管上面的代码依赖于静态声明的结构数组,但是如果您面临着读取未知数的情况,则没有理由不能根据需要动态地动态声明结构。 Let me know if you have additional questions. 如果您还有其他问题,请告诉我。

If your EOL character is a \\n , you could use this: 如果您的EOL字符是\\n ,则可以使用以下命令:

fscanf(inp, "%d %s %29[^\n]", &a, b, c);

where 29 is MAX - 1. Replace \\n with your EOL character(s). 其中29是MAX-1.将\\n替换为您的EOL字符。

Please note that you don't need to pass &b, &c to fscanf since your compiler will convert b and c to &b[0] and &c[0] . 请注意,您不需要将&b, &c传递给fscanf,因为编译器会将b和c转换为&b[0]&c[0] Also, on printf you are passing &name[i] , wich is a char ** when your compiler is expecting a char * . 另外,在printf您传递的是&name[i] ,当编译器期望使用char *时,wich是char ** You need to change &name[i] to name[i] . 您需要将&name[i]更改为name[i] The same for &states[i] . 对于&states[i]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM