简体   繁体   English

如何按字母顺序排列 C 中 a.csv 文件的第一列?

[英]How can I alphabetize the first column of a .csv file in C?

I have a.csv file.我有一个 .csv 文件。 Let's say the data is like this:假设数据是这样的:

Location 1,Location 2,Price,Rooms,Bathrooms,CarParks,Type,Area,Furnish
Upper-East-Side,New-York,310000,3,2,0,Built-up,1000,Partly
West-Village,New-York,278000,2,2,0,Built-up,1000,Partly
Theater-District,New-York,688000,3,2,0,Built-up,1000,Partly

Expected output (alphabetized):预计 output(按字母顺序排列):

Theater-District
Upper-East-Side
West-Village

How can I only show and alphabetize the first column (Location 1) of the file while also skipping the header?我怎样才能只显示文件的第一列(位置 1)并按字母顺序排序,同时还跳过 header?

This is currently my code but it's still in a "read and display" form.这是目前我的代码,但它仍处于“读取和显示”形式。

#include <stdio.h>

int main()
{
  FILE *fh;
  
  fh = fopen("file.csv", "r");
  
  if (fh != NULL)
  {
    int line_number = 0;
    char c;
    while ( (c = fgetc(fh)) != EOF )
    {
        if(line_number > 0 || c == '\n'){
            putchar(c);
        }
        if(c == '\n'){
            line_number++;
        }
    }
    fclose(fh);
  
  } else printf("Error opening file.\n");
  
  return 0;
}

csv is not a well defined format so I suggest you use an existing csv library instead of parsing the data yourself. csv 不是一个定义明确的格式,所以我建议您使用现有的 csv 库,而不是自己解析数据。 For instance, this will not work if the first field has any embedded commas.例如,如果第一个字段有任何嵌入的逗号,这将不起作用。 It relies on scanf() to allocate the line, and resizes the lines array as needed.它依靠 scanf() 来分配行,并根据需要调整lines数组的大小。 This means there are no arbitrary limits.这意味着没有任意限制。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int strcmp2(const void *a, const void *b) {
    return strcmp((const char *) a, (const char *) b);
}

int main() {
    FILE *f = fopen("unsorted.csv", "r");
    if(!f) return 1;

    char **lines = NULL;
    size_t n = 0;
    for(;; n++) {
        char *location1;
        int rv = fscanf(f, "%m[^,]%*[^\n]\n", &location1);
        if(rv != 1) break;
        char **tmp = realloc(lines, (n + 1) * sizeof *tmp);
        if(!tmp) return 1;
        lines = tmp;
        tmp[n] = location1;
    }
    fclose(f);

    free(lines[0]); // header
    qsort(&lines[1], n - 1, sizeof *lines, strcmp2);
    for(size_t i = 1; i < n; i++) {
        printf("%s\n", lines[i]);
        free(lines[i]);
    }
    free(lines);
}

It produces the expected output:它产生预期的 output:

Theater-District
Upper-East-Side
West-Village

So, assuming some hard limits on line length and CSV file record count, we can just use arrays.因此,假设对行长度和 CSV 文件记录数有一些硬性限制,我们可以只使用 arrays。

To read a record, just use fgets() .要读取记录,只需使用fgets() Add each line of text to the array using the usual method.使用通常的方法将每一行文本添加到数组中。

We use a simple string search and truncate to isolate the first field.我们使用简单的字符串搜索和截断来隔离第一个字段。 (Assuming no fancy stuff like double-quoted fields. I assume you are doing homework.) (假设没有像双引号字段这样花哨的东西。我假设你正在做作业。)

To sort everything except the CSV header record, use qsort() with a little additional mathematics.要对CSV header 记录以外的所有内容进行排序,请使用qsort()和一些额外的数学运算。

#include <iso646.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define unused(x) (void)(x)

#define MAX_LINE_LENGTH 100
#define MAX_RECORD_COUNT 100

int main( int argc, char ** argv )
{
  unused( argc );

  char records[MAX_RECORD_COUNT][MAX_LINE_LENGTH];
  size_t record_count = 0;
  
  const char * filename = argv[1];
  if (!filename) return 1;

  // Read our records from file
  FILE * f = fopen( filename, "r" );
  if (!f) return 1;

  while ((record_count < MAX_RECORD_COUNT)
      and fgets( records[record_count], MAX_LINE_LENGTH, f ))
    record_count += 1;

  fclose( f );
  
  // Truncate the strings to just the first field
  for (size_t n = 0;  n < record_count;  n++)
  {
    char * p = strchr( records[n], ',' );
    if (p) *p = '\0';
  }
  
  // Sort everything but the header
  if (record_count > 2)  // must exist at least two records + header
    qsort( records+1, record_count-1, MAX_LINE_LENGTH, 
      (int (*)( const void *, const void * ))strcmp );
  
  // Print everything but the header
  for (size_t n = 1;  n < record_count;  n++)
    printf( "%s\n", records[n] );
  
  return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM