简体   繁体   English

将空白行压缩为C中的一个空白行

[英]Squeeze blank lines into one blank line in C

Hello to refer to the same question but different code. 您好,是指相同的问题,但代码不同。

Replacing multiple new lines in a file with just one 仅用一个替换文件中的多行

int main(void){

    format();
    printf("\n");
    return 0;
}

void format(){
    int c;
    size_t nlines = 1;
    size_t nspace = 0;

    while (( c= getchar()) != EOF ){

        /*TABS*/
        if(c == '\t'){
            c = ' ';
        }
        /*SPACES*/
        if (c ==' '){
            if(nspace > 0){
                continue;
            }
            else{
                putchar(c);
                nspace++;
                nlines = 0;
            }
        }

        /*NEW LINE*/
        else if(c == '\n'){
            if(++nlines >2){
                continue;
            }
            else {
                nlines++;
                nspace = 0;
            }
            putchar(c);
        }   
        else{
            putchar(c);
            nspace = 0;
            nlines = 0;
        }       
    }
}

I want to squeeze multiple blank lines into one blank line but it doesn't seem to work and on Cygwin terminal at the stdout, last line gives me extra blank line although the input doesn't have the blank line at the end. 我想将多个空行压缩为一个空行,但似乎不起作用,并且在stdout的Cygwin终端上,最后一行为我提供了额外的空行,尽管输入的末尾没有空行。

For example 例如
INPUT 输入

Hello   Hi\n
\n
\n
Hey\t\tHola\n

DESIRED OUTPUT 期望的输出

Hello Hi\n
\n
Hey Hola\n

ACTUAL OUTPUT 实际输出

Hello Hi\n
Hey Hola\n

Please explain! 请解释!

You're incrementing nlines twice: 您要增加nlines两次:

else if(c == '\n'){
    if(++nlines >2){  /* incremented here */
        continue;
    }
    else {
        nlines++;     /* incremented here */
        nspace = 0;
    }
    putchar(c);
}

You just want to do it once. 您只想做一次。 I'd suggest just incrementing the counter until it hits 2 and then not incrementing it any more. 我建议只增加计数器直到它达到2,然后再不增加它。 That just means a small change: 那只意味着一个小的变化:

    if(nlines >= 2){
        continue;
    }

Here's a variant of your code. 这是您的代码的变体。 I eliminated the format() function (which is unusual for me since most programs on SO don't use enough functions) incorporating it directly into main() . 我消除了format()函数(这对我来说是不寻常的,因为大多数SO程序都没有使用足够的函数)直接将其合并到main() The code treats spaces and newlines more symmetrically now, fixing the double increment problem also identified in paddy 's answer . 该代码现在更加对称地处理空格和换行符,从而解决了在paddy答案中也发现的双增量问题。 It also only prints out a newline at the end if there wasn't already a newline at the end. 如果末尾还没有换行符,它也只会在末尾打印换行符。 That normalizes files which do not end with a newline. 这样可以规范不以换行符结尾的文件。 The initialization of nlines = 1; nlines = 1;的初始化nlines = 1; deals with multiple newlines at the start of the file — that was well done already. 在文件的开头处理多个换行符-已经做得很好。

#include <stdio.h>

int main(void)
{
    int c;
    size_t nlines = 1;
    size_t nspace = 0;

    while ((c = getchar()) != EOF)
    {
        if (c == '\t')
            c = ' ';
        if (c == ' ')
        {
            if (nspace < 1)
            {
                putchar(c);
                nspace++;
                nlines = 0;
            }
        }
        else if (c == '\n')
        {
            if (nlines < 2)
            {
                putchar(c);
                nlines++;
                nspace = 0;
            }
        }
        else
        {
            putchar(c);
            nspace = 0;
            nlines = 0;
        }
    }
    if (nlines == 0)
        putchar('\n');
    return 0;
}

My testing uses some Bash-specific notations. 我的测试使用了一些特定于Bash的符号。 My program was sb73 : The last of test input does not include a final newline. 我的程序是sb73 :测试输入的最后一个不包含最终换行符。 The outputs use ⌴ to indicate a newline in the output: 输出使用⌴指示输出中的换行符:

$ echo $'Hello   Hi\n\n\nHey\t\tHola\n' | sb73
Hello Hi⌴
⌴
Hey Hola
⌴
$

and: 和:

$ echo $'\n\nHello   Hi\n\n\n    Hey\t\tHola\n' | sb73
⌴
Hello Hi⌴
⌴
 Hey Hola⌴
⌴
$

and: 和:

$ printf '%s' $'\n\nHello   Hi\n\n\n    Hey\t\tHola' | sb73
⌴
Hello Hi⌴
⌴
 Hey Hola⌴
$

Handling CRLF line endings 处理CRLF线尾

The comments identify that the code above doesn't work on a Cygwin terminal, and the plausible reason is that the data being modified has CRLF line endings. 这些注释标识出上面的代码在Cygwin终端上不起作用,并且可能的原因是所修改的数据具有CRLF行尾。 There are various ways around this. 有多种解决方法。 One is to find a way of forcing the standard input into text mode. 一种是找到一种将标准输入强制为文本模式的方法。 In text mode, CRLF line endings should be mapped to Unix-style '\\n' (NL or LF only) endings on input, and Unix-style line ending should be mapped to CRLF line endings on output. 在文本模式下,CRLF行尾应在输入时映射到Unix样式的'\\n' (仅NL或LF)结尾,而Unix样式行尾应在输出上映射到CRLF行尾。

Alternatively, it would be possible simply to ignore CR characters: 或者,可以简单地忽略CR字符:

--- sb73.c  2017-06-08 22:04:28.000000000 -0700
+++ sb47.c  2017-06-08 22:40:24.000000000 -0700
@@ -19,6 +19,8 @@
                 nlines = 0;
             }
         }
+        else if (c == '\r')
+            continue;    // Windows?
         else if (c == '\n')
         {
             if (nlines < 2)

That's a 'unified diff' showing two extra lines in the code. 这是一个“统一差异”,在代码中显示了另外两行。 Or it is possible to handle CR not followed by LF as a regular character and yet handle CR followed by LF as a newline combination: 或者也可以将CR后跟LF作为常规字符而不将CR后跟LF作为换行符组合来处理:

--- sb73.c  2017-06-08 22:04:28.000000000 -0700
+++ sb59.c  2017-06-08 22:42:43.000000000 -0700
@@ -19,6 +19,17 @@
                 nlines = 0;
             }
         }
+        else if (c == '\r')
+        {
+            if ((c = getchar()) == '\n')
+            {
+               ungetc(c, stdin);
+               continue;
+            }
+            putchar('\r');
+            nspace = 0;
+            nlines = 0;
+        }
         else if (c == '\n')
         {
             if (nlines < 2)

There's probably a way to write a state machine that handles CR, but that would be more complex. 可能有一种编写处理CR的状态机的方法,但这会更复杂。

I have a utod program that converts Unix-style line endings to Windows-style; 我有一个utod程序,它将Unix样式的行尾转换为Windows样式; I used that in the pipeline to test the new variants of the code. 我在管道中使用它来测试代码的新变体。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM