简体   繁体   English

C 命令从 C 文件中删除空格

[英]C command to remove spaces from C file

I am trying to write a C program using only stdio.h, stdlib.h, and string.h libraries.我正在尝试仅使用 stdio.h、stdlib.h 和 string.h 库编写 C 程序。 I want to read and print from a CSV file.我想从 CSV 文件中读取和打印。 The file's format should be:该文件的格式应为:

ID,NAME,AGE,GPA ID、姓名、年龄、GPA

for example:例如:

10,bob,18,3.5 10,鲍勃,18,3.5
15,mary,20,4.0 15,玛丽,20,4.0
5,tom, 17, 3.8 5,汤姆,17,3.8

but there can be spaces before or after commas, as in the third line in this example.但逗号前后可以有空格,如本例中的第三行。

My code should print from the csv file in this format:我的代码应该从 csv 文件以这种格式打印:

Record 1: ID=nnn NAME=nnn AGE=nnn GPA=nnn记录 1:ID=nnn 名称=nnn 年龄=nnn GPA=nnn

and should remove the spaces before or after the values.并且应该删除值之前或之后的空格。

How can I do this?我怎样才能做到这一点?

This is what my code is now, but the spaces around ID still show up when I run the code这就是我现在的代码,但是当我运行代码时,ID 周围的空格仍然出现

printf("Record %d: ", rec );
char* comma = strtok(file, ",")
printf("ID=%s ",comma ); 
comma = strtok(NULL, ",");
printf("NAME=%s ", comma ); 
comma = strtok(NULL, ","); 
sscanf(comma, "%d", &age);
printf("Age=%d ", age);
comma = strtok(NULL, ",");
gpa = strtof(comma, NULL);    
printf("GPA=%.2f \n",gpa );

You are making the problem more difficult than it needs to be.你让问题变得比它需要的更困难。 strtok takes multiple delimiters provided in a string and will consider a sequence of any combination of the delimiters as a single delimiter. strtok接受字符串中提供的多个分隔符,并将将分隔符的任意组合的序列视为单个分隔符。 So to handle parsing your .csv file where there may or may not be spaces surrounding the comma, simply include ",\n" (space, comma, newline) as your delimiters and then strtok will split each token removing the comma as well as any leading spaces or trailing newline.因此,要处理解析您的.csv文件,其中逗号周围可能有空格,也可能没有空格,只需将",\n" (空格,逗号,换行符)作为分隔符,然后strtok将拆分每个令牌删除逗号以及任何前导空格或尾随换行符。

That reduces your code to simply:这将您的代码简化为:

#include <stdio.h>
#include <string.h>

#define MAXC 1024      /* if you need a constant, #define one (or more) */
#define DELIM " ,\n"

int main (void) {

    char buf[MAXC];   /* buffer to hold each line */

    while (fgets (buf, MAXC, stdin)) {              /* read each line */
        char *p = buf;                              /* pointer to line */
        /* now simply use strtok to separate all tokens in line */
        for (p = strtok(p, DELIM); p; p = strtok (NULL, DELIM))
            printf ("%-8s", p);                     /* output as desired */
        putchar ('\n');                             /* tidy up with newline */
    }
    return 0;
}

Example Use/Output示例使用/输出

$ ./bin/strtokcsv <dat/spacecomma.csv
10      bob     18      3.5
15      mary    20      4.0
5       tom     17      3.8

(you can adjust the output format as desired). (您可以根据需要调整 output 格式)。

Also see the comment by @Kaz.另请参阅@Kaz 的评论。 A simple loop with getchar() reading a character-at-a-time in a state loop, where you loop checking characters outputting things that are not spaces, commas or newlines, and when you hit a space or comma simply insert an output separator of your choosing and ignore all subsequent spaces, commas, etc.. until you reach your next field and start outputting characters again.一个简单的循环, getchar()在 state 循环中一次读取一个字符,循环检查输出不是空格、逗号或换行符的字符,当您点击空格或逗号时,只需插入 output 分隔符您选择并忽略所有后续空格、逗号等。直到您到达下一个字段并再次开始输出字符。 Definitely worth looking at.绝对值得一看。 Let me know if you have further question.如果您还有其他问题,请告诉我。

Assuming the CSV data doesn't contain quotes that protect commas, we can remove the extra spaces around commas using a program that doesn't do any buffering of the data or any sort of processing with null-terminated character arrays.假设 CSV 数据不包含保护逗号的引号,我们可以使用不会对数据进行任何缓冲或对空终止字符 arrays 进行任何类型处理的程序来删除逗号周围的多余空格。 We just read one character at a time using getchar , and maintain some state in the form of counters that measure how many spaces and commas we have seen:我们只使用getchar一次读取一个字符,并以计数器的形式维护一些 state 来测量我们看到的空格和逗号的数量:

#include <stdio.h>

int main(void)
{
  int nspc = 0;
  int ncomma = 0;
  int ch;

  while ((ch = getchar()) != EOF) {
    switch (ch) {
    case ' ': nspc++; break;
    case ',': ncomma++; break;
    default:
      if (ncomma > 0)
        while (ncomma-- > 0)
          putchar(',');
      else
        while (nspc-- > 0)
          putchar(' ');
      putchar(ch);
      nspc = 0;
      ncomma = 0;
      break;
    }
  }

  return 0;
}

Test data:测试数据:

$ cat clean-comma-test 

a
aa
a,
,a
a a,
,a a
a , b
, a , b c , d
,  a , b   c d   ef,  g h
   ,
   ,a
,  ,
, ,, , ,    ,

Output: Output:

a
aa
a,
,a
a a,
,a a
a,b
,a,b c,d
,a,b   c d   ef,g h
,
,a
,,
,,,,,,

The basic idea is:基本思想是:

  • if we see a field of N spaces that doesn't contain any commas, followed by a character C which isn't a space or comma, then we just reproduce N spaces and character C.如果我们看到一个包含 N 个空格且不包含任何逗号的字段,后跟一个不是空格或逗号的字符 C,那么我们只需复制 N 个空格和字符 C。

  • if we see a field of N spaces (possibly 0) containing one or more commas M, followed by a character C that isn't a space or comma, we reproduce the M commas, followed by C.如果我们看到一个包含一个或多个逗号 M 的 N 个空格(可能为 0)的字段,后跟一个不是空格或逗号的字符 C,我们会复制 M 逗号,然后是 C。

  • lines in C streams are terminated by the newline character '\n' , which serves as C in the case when the comma-space field is the last item in the line. C 流中的行由换行符'\n'终止,在逗号空格字段是该行中的最后一项的情况下,该换行符用作 C。

A C program that doesn't manipulate any pointers cannot have a buffer overflow or memory leak.不操作任何指针的 C 程序不能有缓冲区溢出或 memory 泄漏。 However, I haven't protected the counters against integer overflow.但是,我没有保护计数器免受 integer 溢出。 If you have a field of more than INT_MAX spaces and/or commas, the behavior is undefined.如果您有超过INT_MAX个空格和/或逗号的字段,则行为未定义。 On modern systems, that's well over two billion, so there is a fair amount of justification for not caring about it.在现代系统中,这远远超过 20 亿,因此有相当多的理由不关心它。

The code also doesn't recognize other whitespace such as tabs.该代码也无法识别其他空格,例如制表符。

char *trim(char *str, const char *chars)
{
    char *end = str + strlen(str) - 1;
    while(end > str)
    {
        if(strchr(chars, *str))
        {
            str++;
        }
        if(strchr(chars, *end))
        {
            *end-- = 0;
        }
    }
    return str;
}

and

comma = trim(comma, " ");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM