简体   繁体   English

从C字符串中删除多余的空格?

[英]Remove extra white space from inside a C string?

I have read a few lines of text into an array of C-strings. 我已经在C字符串数组中读了几行文本。 The lines have an arbitrary number of tab or space-delimited columns, and I am trying to figure out how to remove all the extra whitespace between them. 这些行有任意数量的制表符或空格分隔的列,我试图弄清楚如何删除它们之间的所有额外空格。 The end goal is to use strtok to break up the columns. 最终目标是使用strtok来分解列。 This is a good example of the columns: 这是列的一个很好的例子:

Cartwright   Wendy    93
Williamson   Mark     81
Thompson     Mark     100
Anderson     John     76
Turner       Dennis   56

How can I eliminate all but one of the spaces or tabs between the columns so the output looks like this? 如何消除列之间的所有空格或制表符,以便输出看起来像这样?

Cartwright Wendy 93

Alternatively, can I just replace all of the whitespace between the columns with a different character in order to use strtok? 或者,我可以用不同的字符替换列之间的所有空格,以便使用strtok吗? Something like this? 像这样的东西?

Cartwright#Wendy#93

edit: Multiple great answers, but had to pick one. 编辑:多个很棒的答案,但必须选择一个。 Thanks for the help all. 谢谢大家的帮助。

If I may voice the "you're doing it wrong" opinion, why not just eliminate the whitespace while reading? 如果我可以表达“你做错了”的意见,为什么不在阅读时消除空白呢? Use fscanf("%s", string); 使用fscanf("%s", string); to read a "word" (non whitespace), then read the whitespace. 读一个“单词”(非空格),然后读取空格。 If it's spaces or tabs, keep reading into one "line" of data. 如果是空格或制表符,请继续阅读一行“数据”。 If it's a newline, start a new entry. 如果是换行符,请启动新条目。 It's probably easiest in C to get the data into a format you can work with as soon as possible, rather than trying to do heavy-duty text manipulation. 在C语言中,最简单的方法是将数据转换为可以尽快使用的格式,而不是尝试进行繁重的文本操作。

Why not use strtok() directly? 为什么不直接使用strtok() No need to modify the input 无需修改输入

All you need to do is repeat strtok() until you get 3 non-space tokens and then you are done! 你需要做的就是重复strtok()直到得到3个非空间标记,然后你就完成了!

Edit : I originally had a malloced workspace, which I though might be clearer. 编辑 :我最初有一个malloced工作区,我可能会更清楚。 However, doing it w/o extra memory is almost as simple, and I'm being pushed that way in comments and personal IMs, so, here comes...:-) 然而,没有额外的内存这样做几乎就是这么简单,我在评论和个人即时通讯中被推,所以,这里来了...... :-)

void squeezespaces(char* row, char separator) {
  char *current = row;
  int spacing = 0;
  int i;

  for(i=0; row[i]; ++i) {
    if(row[i]==' ') {
      if (!spacing) {
        /* start of a run of spaces -> separator */
        *current++ = separator
        spacing = 1;
      }
    } else {
      *current++ = row[i];
      spacing = 0;
  }
  *current = 0;    
}

The following code modifies the string in place; 以下代码修改了字符串; if you don't want to destroy your original input, you can pass a second buffer to receive the modified string. 如果您不想销毁原始输入,可以传递第二个缓冲区以接收修改后的字符串。 Should be fairly self-explanatory: 应该是相当不言自明的:

#include <stdio.h>
#include <string.h>

char *squeeze(char *str)
{
  int r; /* next character to be read */
  int w; /* next character to be written */

  r=w=0;
  while (str[r])
  {
    if (isspace(str[r]) || iscntrl(str[r]))
    {
      if (w > 0 && !isspace(str[w-1]))
        str[w++] = ' ';
    }
    else
      str[w++] = str[r];
    r++;
  }
  str[w] = 0;
  return str;
}

int main(void)
{
  char test[] = "\t\nThis\nis\ta\b     test.";
  printf("test = %s\n", test);
  printf("squeeze(test) = %s\n", squeeze(test));
  return 0;
}
char* trimwhitespace(char *str_base) {
    char* buffer = str_base;
    while((buffer = strchr(str_base, ' '))) {
        strcpy(buffer, buffer+1);
    }

    return str_base;
}

You could read a line then scan it to find the start of each column. 您可以读取一行然后扫描它以找到每列的开头。 Then use the column data however you'd like. 然后使用您想要的列数据。

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX_COL 3
#define MAX_REC 512

int main (void)
{
    FILE *input;
    char record[MAX_REC + 1];
    char *scan;
    const char *recEnd;
    char *columns[MAX_COL] = { 0 };
    int colCnt;

    input = fopen("input.txt", "r");

    while (fgets(record, sizeof(record), input) != NULL)
    {
        memset(columns, 0, sizeof(columns));  // reset column start pointers

        scan = record;
        recEnd = record + strlen(record);

        for (colCnt = 0; colCnt < MAX_COL; colCnt++ )
        {
          while (scan < recEnd && isspace(*scan)) { scan++; }  // bypass whitespace
          if (scan == recEnd) { break; }
          columns[colCnt] = scan;  // save column start
          while (scan < recEnd && !isspace(*scan)) { scan++; }  // bypass column word
          *scan++ = '\0';
        }

        if (colCnt > 0)
        {
            printf("%s", columns[0]);
            for (int i = 1; i < colCnt; i++)
            {
             printf("#%s", columns[i]);
            }
            printf("\n");
        }
    }

    fclose(input);
}

Note, the code could still use some robust-ification: check for file errors w/ferror; 注意,代码仍然可以使用一些robust -ification:检查文件错误w / ferror; ensure eof was hit w/feof; 确保eof被击中; ensure entire record (all column data) was processed. 确保处理完整记录(所有列数据)。 It could also be made more flexible by using a linked list instead of a fixed array and could be modified to not assume each column only contains a single word (as long as the columns are delimited by a specific character). 通过使用链表而不是固定数组也可以使其更加灵活,并且可以修改为不假设每列只包含一个单词(只要列由特定字符分隔)。

Here's an alternative function that squeezes out repeated space characters, as defined by isspace() in <ctype.h> . 这是一个替代函数,用于挤出重复的空格字符,由<ctype.h> isspace()定义。 It returns the length of the 'squidged' string. 它返回'squidged'字符串的长度。

#include <ctype.h>

size_t squidge(char *str)
{
    char *dst = str;
    char *src = str;
    char  c;
    while ((c = *src++) != '\0')
    {
        if (isspace(c))
        {
            *dst++ = ' ';
            while ((c = *src++) != '\0' && isspace(c))
                ;
            if (c == '\0')
                break;
        }
        *dst++ = c;
    }
    *dst = '\0';
    return(dst - str);
}

#include <stdio.h>
#include <string.h>

int main(void)
{
    char buffer[256];
    while (fgets(buffer, sizeof(buffer), stdin) != 0)
    {
        size_t len = strlen(buffer);
        if (len > 0)
            buffer[--len] = '\0';
        printf("Before: %zd <<%s>>\n", len, buffer);
        len = squidge(buffer);
        printf("After:  %zd <<%s>>\n", len, buffer);
    }
    return(0);
}

I made a small improvment over John Bode's to remove trailing whitespace as well: 我对约翰·博德做了一个小小的改进,以删除尾随的空格:

#include <ctype.h>

char *squeeze(char *str)
{
  char* r; /* next character to be read */
  char* w; /* next character to be written */
  char c;
  int sp, sp_old = 0;

  r=w=str;

  do {
    c=*r;
    sp = isspace(c);
    if (!sp) {
      if (sp_old && c) {
        // don't add a space at end of string
        *w++ = ' ';
      }
      *w++ = c;
    }
    if (str < w) {
      // don't add space at start of line
      sp_old = sp;
    }
    r++;
  }
  while (c);

  return str;
}

#include <stdio.h>

int main(void)
{
  char test[] = "\t\nThis\nis\ta\f     test.\n\t\n";
  //printf("test = %s\n", test);
  printf("squeeze(test) = '%s'\n", squeeze(test));
  return 0;
}

br. BR。

The following code simply takes input character wise, then check for each character if there is space more than once it skips it else it prints the character. 下面的代码只是简单地输入字符,然后检查每个字符是否有多次跳过它,否则它会打印字符。 Same logic you can use for tab also. 您也可以使用相同的逻辑选项卡。 Hope it helps in solving your problem. 希望它有助于解决您的问题。 If there is any problem with this code please let me know. 如果此代码有任何问题,请告诉我。

    int c, count = 0;
    printf ("Please enter your sentence\n");
    while ( ( c = getchar() ) != EOF )  {
        if ( c != ' ' )  {
            putchar ( c );
            count = 0;
        }
        else  {
            count ++;
            if ( count > 1 )
                ;    /* Empty if body */
            else
                putchar ( c );
         }
     }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM