简体   繁体   English

使用std :: getline检测输入结束

[英]Detecting end of input using std::getline

I have a code with the following snippet: 我有一个包含以下代码段的代码:

std::string input;
while(std::getline(std::cin, input))
{   
    //some read only processing with input
}

When I run the program code, I redirect stdin input through the file in.txt (which was created using gedit), and it contains: 当我运行程序代码时,我通过文件in.txt(使用gedit创建)重定向stdin输入,它包含:

ABCD
DEFG
HIJK

Each of the above lines end with one newline in the file in.txt. 上面的每一行都以in.txt文件中的一行换行结束。

The problem I am facing is, after the while loop runs for 3 times (for each line), the program control does not move forward and is stuck. 我面临的问题是,在while循环运行3次后(对于每一行),程序控件不会向前移动并且卡住。 My question is why is this happening and what can I do to resolve the problem? 我的问题是为什么会发生这种情况,我该怎么做才能解决问题?

Some clarification: 一些澄清:

I want to be able to run the program from the command line as such: 我希望能够从命令行运行程序:

$ gcc program.cc -o out
$ ./out < in.txt

Additional Information: 附加信息:

I did some debugging and found that the while loop actually is running for 4 times (the fourth time with input as empty string). 我做了一些调试,发现while循环实际上运行了4次(第四次输入为空字符串)。 This is causing the loop to program to stall, because the //some processing read only with input is unable to do its work. 这导致循环编程停止,因为//只有输入读取的某些处理无法完成其工作。

So my refined question: 所以我提出的问题:

1) Why is the 4th loop running at all? 1)为什么第四个循环运行?

Rationale behind having std::getline() in the while loop's condition must be that, when getline() cannot read any more input, it returns zero and hence the while loop breaks. 在while循环条件下使用std :: getline()的基本原理必须是,当getline()无法再读取任何输入时,它返回零,因此while循环中断。

Contrary to that, while loop instead continues with an empty string! 与此相反,while循环继续使用空字符串! Why then have getline in the while loop condition at all? 那么为什么get循环条件中的getline呢? Isn't that bad design? 设计不是那么糟糕吗?

2) How do I ensure that the while doesn't run for the 4th time without using break statements? 2)如何在不使用break语句的情况下确保while不会第四次运行?

For now I have used a break statement and string stream as follows: 现在我使用了break语句和字符串流,如下所示:

 std::string input; char temp; while(std::getline(std::cin, input)) { std::istringstream iss(input); if (!(iss >>temp)) { break; } //some read only processing with input } 

But clearly there has to be a more elegant way. 但显然必须有一种更优雅的方式。

Contrary to DeadMG's answer , I believe the problem is with the contents of your input file, not with your expectation about the behavior of the newline character. DeadMG的答案相反,我认为问题在于输入文件的内容,而不是您对换行符的行为的期望。


UPDATE : Now that I've had a chance to play with gedit , I think I see what caused the problem. 更新:既然我有机会玩gedit ,我想我知道是什么导致了这个问题。 gedit apparently is designed to make it difficult to create a file without a newline on the last line (which is sensible behavior). gedit显然是为了在最后一行没有换行时很难创建一个文件(这是明智的行为)。 If you open gedit and type three lines of input, typing Enter at the end of each line, then save the file, it will actually create a 4-line file, with the 4th line empty. 如果打开gedit并输入三行输入,在每行的末尾键入Enter ,然后保存文件,它实际上会创建一个4行文件,第4行为空。 The complete contents of the file, using your example, would then be "ABCD\\nEFGH\\nIJKL\\n\\n" . 使用您的示例,文件的完整内容将为"ABCD\\nEFGH\\nIJKL\\n\\n" To avoid creating that extra empty line, just don't type Enter at the end of the last line; 为避免创建额外的空行,只需在最后一行的末尾不要输入Enter ; gedit will provide the required newline character for you. gedit将为您提供所需的换行符。

(As a special case, if you don't enter anything at all, gedit will create an empty file.) (作为特殊情况,如果您根本不输入任何内容, gedit将创建一个空文件。)

Note this important distinction: In gedit , typing Enter creates a new line. 请注意这一重要区别:在gedit ,键入Enter会创建一个新行。 In a text file stored on disk, a newline character (LF, '\\n' ) denotes the end of the current line. 在存储在磁盘上的文本文件中,换行符(LF, '\\n' )表示当前行的结尾。


Text file representations vary from system to system. 文本文件表示因系统而异。 The most common representations for an end-of-line marker are a single ASCII LF (newline) character (Unix, Linux, and similar systems), and as sequence of two characters, CR and LF (MS Windows). 行尾标记的最常见表示是单个ASCII LF(换行符)字符(Unix,Linux和类似系统),以及两个字符CR和LF(MS Windows)的序列。 I'll assume the Unix-like representation here. 我将在这里假设类似Unix的表示。 (UPDATE: In a comment, you said you're using Ubuntu 12.04 and gcc 4.6.3, so text files should definitely be in the Unix-style format.) (更新:在评论中,你说你正在使用Ubuntu 12.04和gcc 4.6.3,所以文本文件肯定应该是Unix风格的格式。)

I just wrote the following program based on the code in your question: 我刚刚根据你问题中的代码编写了以下程序:

#include <iostream>
#include <string>
int main() {
    std::string input;
    int line_number = 0;
    while(std::getline(std::cin, input))
    {   
        line_number ++;
        std::cout << "line " << line_number
                  << ", input = \"" << input << "\"\n";
    }
}

and I created a 3-line text file in.txt : 我在in.txt创建了一个3行文本文件:

ABCD
EFGH
IJHL

In the file in.txt each line is terminated by a single newline character. 在文件in.txt每一行都由一个换行符终止。

Here's the output I get: 这是我得到的输出:

$ cat in.txt
ABCD
EFGH
IJHL
$ g++ c.cpp -o c
$ ./c < in.txt
line 1, input = "ABCD"
line 2, input = "EFGH"
line 3, input = "IJHL"
$

The final newline at the very end of the file does not start a newline, it merely marks the end of the current line. 文件最末端的最后一个换行符不会启动换行符,它只标记当前行的结尾。 (A text file that doesn't end with a newline character might not even be valid, depending on the system.) (不以换行符结尾的文本文件可能甚至无效,具体取决于系统。)

I can get the behavior you describe if I add a second newline character to the end of in.txt : 如果我在in.txt的末尾添加第二个换行符,我可以得到你描述的行为:

$ echo '' >> in.txt
$ cat in.txt
ABCD
EFGH
IJHL

$ ./c < in.txt
line 1, input = "ABCD"
line 2, input = "EFGH"
line 3, input = "IJHL"
line 4, input = ""
$

The program sees an empty line at the end of the input file because there's an empty line at the end of the input file . 程序在输入文件的末尾看到一个空行, 因为输入文件的末尾有一个空行

If you examine the contents of in.txt , you'll find two newline (LF) characters at the very end, one to mark the end of the third line, and one to mark the end of the (empty) fourth line. 如果你检查in.txt的内容,你会在最后找到两个换行符(LF),一个用于标记第三行的结尾,另一个用于标记(空)第四行的结尾。 (Or if it's a Windows-format text file, you'll find a CR-LF-CR-LF sequence at the very end of the file.) (或者,如果它是Windows格式的文本文件,您将在文件的最后找到CR-LF-CR-LF序列。)

If your code doesn't deal properly with empty lines, then you should either ensure that it doesn't receive any empty lines on its input, or, better, modify it so it handles empty lines correctly. 如果你的代码没有正确处理空行,那么你应该确保它的输入没有接收任何空行,或者更好的是,修改它以便正确处理空行。 How should it handle empty lines? 它应该如何处理空行? That depends on what the program is required to do, and it's probably entirely up to you. 这取决于程序需要做什么,这可能完全取决于你。 You can silently skip empty lines: 你可以默默地跳过空行:

if (input != "") {
    // process line
}

or you can treat an empty line as an error: 或者您可以将空行视为错误:

if (input == "") {
    // error handling code
}

or you can treat empty lines as valid data. 或者您可以将空行视为有效数据。

In any case, you should decide exactly how you want to handle empty lines. 无论如何,您应该确切地决定如何处理空行。

Why is the 4th loop running at all? 为什么第四个循环运行?

Because the text input contains four lines. 因为文本输入包含四行。

The new line character means just that- "Start a new line". 新行字符意味着 - “开始新行”。 It does not mean "The preceeding line is complete", and in this test, the difference between those two semantics is revealed. 它并不意味着“前一行已完成”,并且在此测试中,揭示了这两种语义之间的差异。 So we have 所以我们有

1. ABCD
2. DEFG
3. HIJK
4.

The newline character at the end of the third line begins a new line- just like it should do and exactly like its name says it will. 第三行末尾的换行符开始一个新行 - 就像它应该做的那样,就像它的名字所说的那样。 The fact that that line is empty is why you get back an empty string. 该行为空的事实是您返回空字符串的原因。 If you want to avoid it, trim the newline at the end of the third line, or, simply special-case if (input == "") break; 如果你想避免它,修剪第三行末尾的换行符,或者只是特殊情况if (input == "") break; .

The problem has nothing to do with your code, and lies in your faulty expectation of the behaviour of the newline character. 问题与您的代码无关,而在于您对换行符的行为的错误期望。

Finale: 结局:

Edit: Please read the accepted answer for the correct explanation of the problem and the solution as well. 编辑:请阅读接受的答案,以正确解释问题和解决方案。


As a note to people using std::getline() in their while loop condition, remember to check if it's an empty string inside the loop and break accordingly, like this: 作为在他们的while循环条件中使用std :: getline()的人的注释,记得检查它是否是循环内的空字符串并相应地中断,如下所示:

string input;
while(std::getline(std::cin, input))
{
    if(input = "")
        break;
    //some read only processing with input 
}

My suggestion: Don't have std::getline() in the while loop condition at all. 我的建议:在while循环条件下根本没有std :: getline()。 Rather use std::cin like this: 而是像这样使用std :: cin:

while(std::cin>>a>>b)
{
    //loop body
}

This way extra checking for empty string will not be required and code design is better. 这样就不需要额外检查空字符串,代码设计更好。

The latter method mentioned above negates the explicit checking of an empty string (However, it is always better to do as much explicit checking as possible on the format of the input). 上面提到的后一种方法否定了对空字符串的显式检查(但是,对输入的格式进行尽可能多的显式检查总是更好)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM