[英]Getline and EOF
I am trying to read a file. 我正在尝试读取文件。 The file contents have a newline between words in a sentence and two newlines between sentence.
文件内容在句子中的单词之间有一个换行符,在句子之间有两个换行符。 I can only read one sentence.
我只能读一句话。 I have tried to put a EOF as a delimiter in getline but it seems not to work.
我试图将EOF作为getline中的定界符,但似乎不起作用。 Does anyone have any suggestions on how to resolve this?
有人对如何解决这个问题有任何建议吗?
The file contents are: 文件内容为:
County
县
Grand
盛大
Jury
陪审团
said Friday an investigation of Atlanta's recent primary
周五说,对亚特兰大最近的初诊
election produced `` .选举产生了。 no evidence '' .
没有证据 '' 。 that any
那个
irregularities took place .发生违规行为。 .
。 The jury further said in
陪审团进一步表示
term-end presentments that the City Executive Committee市执行委员会的期末报告
But what's get printed is:
但是打印出来的是:
County
县
Grand Jury said Friday an investigation of Atlanta's
大陪审团周五表示,对亚特兰大
recent primary election produced `` .最近的初选产生了。 no evidence '' .
没有证据 '' 。 that any irregularities took place .
任何违规行为发生了。 .
。
string line;
string a, b;
ifstream infile("myFile");
while (getline(infile, line))
{
istringstream iss(line);
if (!(iss >> a >> b)) { break; } // error
cout << a << b << endl;
}
#include <iostream>
#include <vector>
#include <boost/tokenizer.hpp>
using namespace std;
typedef boost::tokenizer<boost::char_separator<char>,
std::istreambuf_iterator<char> >
tokenizer;
void printPhrase(const vector<string>& _phrase) {
if (!_phrase.empty()) {
vector<string>::const_iterator it = _phrase.begin();
cout << "Phrase: \"" << *it;
for(++it; it != _phrase.end(); ++it)
cout << "\", \"" << *it;
cout << "\"" << endl;
} else
cout << "Empty phrase" << endl;
}
int main() {
boost::char_separator<char> sep("", "\n", boost::drop_empty_tokens);
istreambuf_iterator<char> citer(cin);
istreambuf_iterator<char> eof;
tokenizer tokens(citer, eof, sep);
int eolcount = 0;
vector<string> phrase;
for (tokenizer::iterator it = tokens.begin(); it != tokens.end(); ++it) {
if (*it == "\n") {
eolcount ++;
if (eolcount > 1 && eolcount % 2 == 0) { // phrase end
printPhrase(phrase);
phrase.clear();
}
} else {
eolcount = 0;
phrase.push_back(*it);
}
}
if (!phrase.empty())
printPhrase(phrase);
return 0;
}
The basic idea is to keep newlines in the output, count them and if there're 2, 4, .. even number of sequential newlines print words collected so far. 基本思想是将换行符保留在输出中,对它们进行计数,如果到目前为止已收集到2、4,..偶数个连续换行符,则会打印出单词。 A non-newline token breaks the sequence, and this token is added into the accumulator.
非换行标记会破坏序列,并且此标记会添加到累加器中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.